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INTRODUCTION 


confronted with a choice in regard to the values of that variate. 
periments be quite simple the question may be without great importance; 


when their requirements as to time or expenditure come into account the problem 
arises, how the observations should be chosen in order that a limited number of 
them may give the maximum amount of knowledge. 
relationship between the observed quantity, which we shall name the primary 
variate, and its essential circumstances, the secondary variates, and upon the 


variation of the errors of the observations. 
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It clearly depends upon the 





2 Choice in the Distribution of Observations 


When we deal with, for example, a linear function which it is possible to ob- 
serve with the same accuracy for all values of the indefinite variate we should 
not hesitate to put the observations in two equally big groups as far apart from 
each other as feasible. But if the standard deviation of the observations be a 
function of the indefinite variate and increases with the distance from the middle 
oi the range, where is then the point in which the advantage of removing the two 
groups of observations from each other just counterbalances the disadvantages of 
increasing the error of observations? The problem becomes very complicated for 
functions of higher degrees. 

We shall in this memoir try to contribute to the solution in the case of poly- 
nomial functions by examining the standard deviations of the adjusted and more 
especially the interpolated values of such functions for different distributions of 
observations. Those values inside the working range of observations may be 
considered the sum of knowledge acquired by the experiments. The adjusted 
values outside the working range may probably in exceptional cases be of interest, 
but as only by some other type of experiment we can make sure that the form of 
function holds outside the range they are in ordinary cases without great value. 
We shall therefore aim at finding the distribution of observations which within 
the selected range gives the most satisfactory standard deviations of the adjusted 
values of the function. 

To consider the standard deviations satisfactory we must of course demand 
that they shall be as small as possible, and since a greater accuracy in one part 
may be expected to be accompanied by a smaller accuracy in another part we 
want them in addition to be as near constant as possible. In other words the 
curve of standard deviation with the lowest possible maximum value within the 
working range of observations is what we shall attempt to find. It appears that 
the distribution of observations which fulfils this demand consists of specially placed 
groups in number just sufficient to determine the constants of the function. We 
shall accordingly pay attention also to the desirability usually present of ascer- 
taining the form of function by means of the observations. As might be expected 
we find that the standard deviations obtained from a uniform continuous distri- 
bution of observations increase towards the ends of the range. By choosing a 
uniform continuous distribution with additional clusters at the ends of the range 
we shall try to find a compromise between the two desiderata of a low maximum 
of standard deviation and of a uniform distribution. 

The indefinite variate is supposed to have a vanishing error of observation 
compared with that of the principal variate. This error may be constant or varying 
with the indefinite variate, but in either case it is supposed to follow the typical 
law so closely that the method of least squares may satisfactorily be applied to the 
observations. After having found first the most advantageous distributions for 
observations of functions up to the sixth degree with constant standard devia- 
tions we examine the case for observations of functions of the first and of the 
second degree which have standard deviations of the form o(1 + az) and o(1 + az?). 
If it is profitable to use the whole of the working range the latter distributions 
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are practically found from the former by multiplying their frequencies by the squared 
standard deviations of the observations at the corresponding place. But in cases 
where extrapolation is of advantage, and the whole range therefore not to be used, 
the law of the frequencies has to be examined anew. 

In Section VIII we find for the same two cases of varying error of observa- 
tion the distributions which make each single constant of a function of the first 
and of the second degree a minimum. 


I. Adjustment of a polynomial function of one variable ; general distribution 
of observations. 
(I) Let ¥,, %...... Be iases yn be N observations of a function of nth degree 
taken at the points 2, 2, ...... We chon Ly, 
Y = A) + 2+ Agz* + ...... Sh Os os ca ccbaccesseeeawees (1). 
Let us assume that from earlier experience we know the standard deviation of an 
observation of y to be o Vf (x). The method of least squares will then give us the 
following system of normal equations in which the sums are to be extended over 
all the observations: 











if . fy | f Zy " 
8175} =e Feat + 8 {pets +8 [peahat sie +8 iyah% 

{¥o%| _ of * | oe" =) (o 
ah ea oe ‘ie Fe insiiiss Fe iodine: S| Fe} an 
s{%%)_ of > li igi % l,i ¢f % {laf 
Ve y * Fea} ** eat trash ee + Fens 

ae eee a 
Sie. ~ Feat FE “fats (a Pa 8 | Fe 
peed. (2). 


If f(z) is 1 the sums are the moment coefficients of the places of observations 
multiplied by N, and in the general case we shall for brevity put 


sf *» ba ta 








(f (>) 
By elimination of the a’s between (1) and (2) we find 
N.y 1 x ere a" | 
) 
| § { Yo | m m eee My 

if (,)) 0 1 2 

S | Fa) ~ Wy i Mn 
i. 2 3 Pe (3) 

S } Fla ‘) j . he Ge ss Mats 

S f YoXy | My Mary Mate vers Msn 
\f (»)) 


which determines the adjusted y corresponding to the variable z. 
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(2) To find the standard deviation o,, of an adjusted y, it will be easiest to 
start from the equations (2). If the first be multiplied by ay, the second by.a, and 


so on before summing, and if we choose apg, a, ...... a, so that 
ApMy +a;M, +49M, +...... + aam, =1 
ApM, +a;My, +4,.M, + ...... + Q,My4, = u 
ApM, +a,Mz +a,Mm +...... + GgMnt+2 = * Peers ey (4), 
Ag My + Ay Myty + AgMnte + ...00e + A,M, = a] 
ve find that ae a a4 + Ons 
we find tha G= N Fe) [ao + a, Sy + g%, +... + a,7,]t; 
Ly) 
2 n 2) 
and therefore “= 7 -§ {re ~y ao + G4 Lpy t+ Oh, + ..000 + Gy2),] ‘t 
By multiplying out a res this may be written 
sk ee 
"2% {a9 [apm +a;m, +4,M, +... +d,M, | 
+ a, [agm, +a,m, +4gMy +... + OnMn+4] 
+a, [agm, +a,m, +a,m, +...... + On Mato] 
+ ay [agmy + ay My4; + Ay Mytot -..-- + An Mn}}, 
9 
. o~ 2 ; aa 
or applying (4) a = N (ay + a, 2, + ag%, + ...... he Ma bevaserasoceketenos (5). 
Hence ie is found by elimination of the a’s between (4) and (5), which results in 
2 N 1 . ot - 
Oy, ° of Ly Ly weeeee Ly 
My Mm, eee Mn 
. ee a a eee sts OR. selecisscteos (6). 
. , WE vee: Mass | 
: . . . . 
n 
Zr MN, Meaty Mats o000s. Mon 


This determinant is of fundamental importance for all the following work and 
it will be useful at once to examine it more closely. 


(3) First however it may be pointed out that the standard deviation of any 


other linear function 4 = bya, + ba, + bgay + «00... Drdn 
of the constants of the function y may be determined in quite the same way by 
- N 
» + 
| o,, oe by b, b, eeeees b, 
» Co 
bo My MM, Mie Seca My 
b, m, My, i SES ges POO. sesecgoswcesnees (7) 
b, fh. M, Wha causes Mato 


| b,, 15, Mis ~ AR ae soos Mon | 
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or 


In particular o?_ is found from 
D 


me 0 0 RS Es | 
0 mM, ™, Me... My on. My | 
0 m Ms, Ms +0. May... Mary | 
a. ek. ee ee ees Rete (8). 
| My Mysy Myo wee May --. Mnyy | 
Roos dees oe beg 
| 0 MM “Mass “Mase s.. Mgrg ic. Men | 


(4) Let us call a determinant, identical with that of (6) except that it has 0 
instead of the element a > , A, let A,, be its minor not containing the rth row 


and sth column, again let A,,,,, be the minor of this not containing the pth 
row and the gth column of A. We then find from (8) 


eee o Ay+2, 942,11 
i, — N of as. OPP eee eee eee eee eee eee eee eee eee) (9). 
With this notation we obtain from (6) 
ee A 
= 5 (- x) liar eee ae R eR t: (10). 


In the following we shall drop the index r and indicate by ,o, the standard 
deviation of a y adjusted by means of a function of the nth degree. 


If we were dealing with a function of (n — 1)st degree and retained the observa- 
tions distributed as before we should find 
ac (- eet) 
tei N An+s, n+2, 1,1 / 


and therefore 





a — c= o* Ai : Anse, nts aoe A. Anis, nee, 1,1 
n¥’y n—-1¥y — i Ts ; ’ 
N Ai ° An+e n+2,1,1 
but A is orthosymmetrical and therefore the numerator of this fraction equals 
A, +2,19 and ‘ 
2 A? 
: Ad n+2,1 
gt ang — . 


nenty i N Ay ° jase, 042. 1,1 : 
It was shown before that 


2 
ed An+2,n+2,11 
oc ~h > 


- Hm Bes 
hence A, , and A,42,n42,1,1 have the same sign, and ,,0,, — n-19, is therefore a square 
of a function of z. In the same way we can express ,_,0, — ,-3¢, and thus further 


ee | 
. — by which means ,o? is developed in a sum 
n“¥ 


down all the differences till yo? = ps 
LS «are 


of squares and takes the shape 
























(5) 


of x2, 





the 2nth degree of x. 


was just seen, is the factor with which 


and n = 2p, 
ae i 
27S, % o2 
l My 
a my, 
x mM, 
gr Me 
x 0 
x 0 
x 0) 
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o2 


N 


¢,, throughout the range. 


a GR hake, Lad 
Ms Mg ssiese Moy 0 
Mi - < Mg cred Meyrg O 
Me Win sd0e83 Meanie’ “0 
Moore Manes v0e+s M4y 0 
9 i hated 0 Ms 
0 a Sean ee mM, 
0 capt 0) m 


The coefficient of 2?" is the square of 


1 m™ m™,|* 
1 m/|* |o mM, mM, 
o? {2. xz My, A \a2? Mm, ms! 
N (m, | My, MM, My, Mm, | M Mm M, iy 
| °| 

Me - |M, Ms, /M, My! | M, Mz Mz 
| Mt, Mz mM, | 

ee ee Mas 

| 

2M My see, My 

ce Sa Serre ees | 

| | 

| . : a 
‘ | ) 

ie r My _Mny+1 ++++- Meon—1 , werecccecees (11) 
hy Ms sev oa hee Se eek m, |) 
ee pee My, | |My, Me «..... Mn +1 | 
oe . . | 
| 
as  Wigtecces Manan t fits Mila eo eess Mon 


1,1 


should be multiplied in order to give 
o;,,» it is therefore positive and can never vanish. 


If all the m’s with odd indices are zero it is seen from (6) that o’, is a function 
This is, at least in theory, a natural thing to aim at, since our general 


purpose is to find a curve for o', giving as nearly as possible a constant value for 


Rearranging the order of rows and columns in (6) we get, when all m,,,, = 0 


x * ee y2P-1 | 
0 eee S..4 
0 By ties a 
0 pales 0 | 
- | 
0 Desens 0 | =0 
my Wig. ceccavs Mop 
Mg gee oe Mop+9 


Mio 








from which we find 
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1 x "eee gp | 
lL M&% Ms Meg reece Mey 
2 4 
x Ms mM, Me 0000. Manis 
4 
sa mM, Me es ae Mois 
2 
3? a? (| £*P Mey Meyig Manig----- May 
7 N | My Ms Ma Se Moy 
| Me, Mg Mis boned Meos2 
mM, Me Meg  sreeee Mon+a 
| Moy Mon+2 Mon+4 serene Map 
0 ] x x4 Se ei 
1 Mz mM, Mg scite - Mey 
a M4 Me Ws secu Meon+a 
ax Me Ms , SEE Men1.4 
4 gf ee ee ee ee Mano 
Ms M, ee Ms, | 
| My Me Te ies Mon+9 
| Me Mg ig” aseexs Hesse 
| 
| 
| Men Manze Moprg +--+. Map—2 


9 


For a function of the degree 


Hence we find 





| 0 ] a ee ee a 
eae Mo Mo 1 ee Mong 
| a Me mM, Mig? eee. Moy 
| af M4 Me Whe scsysc Mon+2 
se a aa o7 {| 237"? May. May Manin «..--. Map—a 
rie N | Me Me cree ee 
Me M4 Wa! wesc: May | 
M4 Me We soos Mey+2 





Moy 








Map+e 





2p —1 we.get the same determinant as in (12) 
except that it does not contain the row and column in which 22” is found. 
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0 1 a? OP senaes ges 
1 My mM, Me «circ. Men 
a* Ma Me Ws amntes Mon+9 
} 
4 
x Me Ms ia cca Moy+4 | 
2p-2 | 
2 doll Mon Mani2 Mopi4------ Map—2 | ) 
+2 Fs cacey eens 
Me 3 Mg We as. Dae j 
mM, Me > Oras Moye | 
Me Ms Wit Sixers Mansa | 
‘ he 
| 
| Men ‘Wansg Maace #t-5s: Mayo | 


(6) The last two determinant ratios of (13) and (14) are identical, and when 
the numerator of the first fraction of (13) is indicated by 5 we therefore find 


2 7 ( S42, | = 
5 


) 
2p%y — 2p-19y = HH 
N 1,1, 9+2, p+2 841 


or as 8 is orthosymmetrical and therefore 
8, as S42, p+2 — 5. 5, 1, D+2, p+2 = 8h +2, v 


- 
eee 8421 ; 


2 2 
2p%y — 2p-19y 


Comparing », 0, and op-10,, we see that they have the first determinant ratio 
in common and that when y stands for the numerator of the other fraction of 


2 
Ea 2. © anf 7etten _ 7. 
27-19 y 29-29, = N ? 
Yv+1,p+1,11 1,1 


Z 
2y-19, we have 


or again, since y is orthosymmetrical, 
2 2 
, ee +1, 1 
20-19% y — ap-29, = a U* on : 
N ¥1,1+ Y p41, 041,11 


The general formula (11) hence for any m,.,,, = 0 takes the shape 








\2 2 
1 mMg | 1 mg 
2 2 | 2 
of == BS x? .% ae. Se ee ee = AOE 
~~? ite Mo | My Me Ms Mg 
aa My 
Mz Ms, M, Ms 
| 1 Ms Wg. savccg Mon—2 
| - a tie ace Mo» 
| ° 
4 2? | z-2 me, -Magis ----:- May-4 
| me Wha nese Mep—2 Ms _ eae Moy 


| mM, Meg occeee Moy Ms Mig ccceve Mon+2 
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2 
1 m™ ar Moyo | 
. | 
ee ae skesee Mey | 
2 
‘ eee ee May-2 ) (15) 
| M9 Wha asses Meo-s | Mo gates : Mag 19 
| Me Ms seenee Mey | Ms Ms, seeeee Men+2 
| Mep-2 Mop eeeeee Map-4 Moy Meon+2 eeeeee Map 





(7) Before leaving the general case and treating special distributions of 
observations three auxiliary propositions shall be proved. We shall first prove that 
the curve of ,0,, can never be entirely below ° er? With that purpose ,o; will 
be summed over all the places of shservation with the weight re , Le. for a 
continuous distribution of observations, the expression ai o; dx, where y (z) is 


the number of observations, will be integrated over the range of observations. 


Looking first at the numerator of the last term of (11) we find that it can be 
expanded into 


i ae See Mn. | | 2M) mM, 





| 
| Weems ee ee ee Maa 
' | } 
| Fo My Mg revere Mn | ‘lle i. vegank My 
(— 1)**2 {| a* Mh, My wuss Mn+3a yy Arasasit| 2 My Me «0... Mass 
| ° | ° 
| : | 
| + 
hee MR Maas. sacs Mon—1 | re Bh, Wks cise Msn—-1 
| 2 m m m 
0 1 tteeee n~1 
| 
| PF Ge hg crack Mm, | 
| pnt2 
X Ay, nte,31 + eee + | a Me Ms vores Mnsr | X Aina, nt2.i} - 


OTE Ment 


Now / id pe dx integrated over all the observations is what we have called 


N.m,. When integrating the determinants we therefore find that the first n of 
them will vanish, two of their columns consisting of proportional elements, whereas 
the integral of the last determinant is 


i” i OMA, canoes a 
Mos a ee My, 
“eee eee Marr |= (— 1)" NA, }. 
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As Ay n42,n+2,1 = — Anie,n+2,1,1, the integral of the last term of (11) equals N. 
The integration of the other terms, including the first, gives the same result so that 


[0° ¥ (2) ay — G2 (n +1), 


i" "f (@) 
and as ie dz = Nm, 
the mean value of ,,,, calculated in this special way is 
mth 
~  —_ ? 
It is therefore clear either that ,o, must at all the places of observation be 


2 l 
equal to y : es or ,¢, must at some of these places be greater. The first case 
0 
cannot be realised by a distribution of which any part is continuous, as ,0,, is proved 
to be of the 2rth degree in x. If therefore we could find a distribution consisting 
of groups of observations for which at all the places of observation ,,o}, was equal 
? 1 - é 2 
to Wy ; ia , and if further we could choose the places of observation so that ,o, 
0 
at all other places within the range of observations was smaller than that value, 
we should know that no other distribution of observations with that value for m, 


could provide a curve of standard deviation with a lower maximum. 


If the standard deviation of the observations be constant and equal a, f (x) 


equals 1, and so does m). After what we have just proved the maximum of the ,,o° 
2 
curve cannot then be lower than 2 (n+ 1). Now when we choose to distribute our 


N 
N observations in (n+ 1) equally big groups the adjusted y at each of these (n+ 1) 
places will be the mean of the observations and its squared standard deviation will 


2 
be (n+ 1). Hence our problem is reduced to find out how to arrange a table of 


N 
(n + 1) values of a function of the nth degree to make the squared standard deviation 
of any interpolation result inside the range smaller than the squared standard 
deviation of the values of the table. It will be seen in what follows that this can 
up to n equal 6—that is so far as the problem here has been investigated—be 
obtained by one and only one form of grouping. 

When the standard deviation of the observations varies over the range, m, 
varies with the different distributions, and we cannot use the same method for 
finding the best distribution. It even appears that the best distribution has not 
always its maxima at the places of observation. 


(8) A second problem which we want to consider here is the condition for two 
adjusted y’s being uncorrelated. In the beginning of this section it has been shown 
that the adjusted y, 


ev 


1 4 2 sare 
=H S LAs [ao + a4%y + aed, + ...... + a,%, ; 
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when 
AgpMs + aym 4. Ag Mo a eeccee a AanMn = 1 
AgM + a,™Ms + AgMs + ecccece a AnMni1 = o) 
2 
AgpMs oo ay, Ms + Ap ™M, + eeevee bk On Mn+ = a? | eecceccececs (16). 
‘Ld 
ApM,y ok Oy Mri + Ap Mnio ce eccces + An Mon => Z,.! 





Let y, be another adjusted value, then 


~1gf_¥Yo 2 , 
a N S {ie [yo + YiVp + Yo%y + ores + vat] , 


where 
YoMg +¥yMy + YQMg +. +y,m, =1 
Yom bt Yims oe Y2Mz3 a eccces + Yn n+1 = L, 
Yom, +My + YM +... + Yang = Zo) socrrroreres (17). 
YoMn + ViMass + Y2Mnsg + creer + YnMoqn = 2} 


Hence the condition that y, and y, are uncorrelated is, since the squared standard 
deviation of the observed y, equals o?f (x,), 


1 3 bs 2 
S i [ay + ay%y + ay%, +... + ay] - [Yo + W12p + Ye%, +... + Yn, \ =0, 
(f (%p) ) 
{ Yo_ 2 n 
or S Flas) [ag + a %y + Og%, + ...... + On2p] 
( V1 2 3 n+19) 
+77 [apy + ay%, + ag%,, + cvcces + a,2, }} 


2 : n+2 ) 
+8 { . [ap%;, + a2), + age + ...... + a,2) J eee 


D 
( Y n n tate oi 
bid Fla) [aoz,, a, 2%, : =e a,%,* ofr) sens + a,%, wx, 


anll 
Remembering that S ire , = Nm, and applying the relations (16) this re- 
duces to : 


Yo Vibe + YoEe tH vveeee + ynv, = 0, 
from which the y’s are eliminated by (17). 


0. 4 oo a 


Ft chante x. 
Ds in 1 hs Wie aueccee Mn 
o mM, Ms; ee Marr 
i —¢, | J eee eee! (18) 
L, Me Ms Wa sbckes Mnso 
n 
a ee eee Msn 





is therefore the condition that y, and y, are uncorrelated. 
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(9) Returning to the formula (11) for o, written as a 


now prove that the (p + 1)st term of this put equal to zero determines a set of p abscissae 


the adjusted y’s of which are mutually uncorrelated both for 
the (p — 1)st degree. 


The condition for y, and y, corresponding to the arguments 2, and 2, being 


uncorrelated is for a function of the (p — 1)st degree 


| 0 1 ee ee i | 
| 1 f, Me ey i Mew | 
| Ly | Me eee My, | 
& wee Se Age. ~ yee Most | 
Wes Win es ocaees Mess | 


and for the same distribution of observations and for a function of the pth degree 


the condition is 


ae ty i hate a} 
1 mM Mm Arr My 
1S. Wy My ce Most 9 
a a wre Moo 
S WM, My Mose ---- Moy 
Putting Myo Mm, ee My 
Mm, Ms Wis arene Moir | 
| My Ms Whe’ eases Mos | = D, 
| pate 
| Me Moir Morse --...- Moy | 


these conditions may be written 


p-l 
r ag 
2 {2 . 2 Doss, 941,741,842 = 9 


Pp 
r s 
and Be ty « Se Prca cist * O..0-5. 
0 


where the sums include all combinations of powers with 
and (p— 1), and 0 and p respectively. 
Now we have for an orthosymmetrical determinant A, 
Noe - Ay g — A. Ags’ g” = Aggy - Aggy’ 
If therefore (19) is multiplied by D and subtracted 
Dy41, p11 the coefficient of x; . x, becomes 


Do+1, 9+1 7 D, +1,8+1 D. Dyes, 941, 410041 — Dyas ott : Dy +3, 041) 


as long as both r and s are smaller than p. 
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sum of squares we shall 


a function of the pth and 


r and s lying between 0 





from (20) multiplied by 
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When one of them, for example s, equals p the term is 
a. UH - Doss, 941 - Drs, 41; 
which is of the same form and this also holds for r = s = p when the term is 
wy af ca * Di, pti 
The total result is thus 


p 
Tr 8 
2m : D431, 741 : Do41, 341 = 0, 


or in the form of determinants 








ee “ae eee See Mo. |-| bey Ry MY oss “| | 
| ay ee a he My | i y & ™ My 
| a Ms Ms Mie) pores Mos | - ~~. eee Moir  =0 
| 
iF 
ae Si Wis Ws 35s Hass {1 My Rk eae se. ee 
Hence z, and x, must be roots of 
Bie “Se “ee “eee Mo 
a ak My 
| 22 M, Mz Wa xssees i. Pe ig aa | Renee rents nea (21). 
ae | 
_a oS re ee Meas 


When 2, is found from this and substituted in (19) or (20) we get since the 
coefficient of z} in the latter is zero an equation of the (p — 1)st degree to deter- 
mine Z,. It is therefore clear that any pair of roots of (21) determine a pair of 
uncorrelated y’s. 


II.. The “best”? grouping of observations with constant standard deviation. 
(1) It was shown in the last section under (7) that the mean of the squared 
standard deviations of the adjusted y taken over the places of observation and weighted 


2 
with the number of observations at each place is equal to ae (n + 1) and that there- 
N 


fore the curve of squared standard deviation can never be entirely below that value. And 
further, that since (n + 1) equally big groups of observations at the places of 
observations give the squared standard deviation this minimum, there is the 
possibility, ,0% being of the 2nth degree in x, that by placing the groups at special 
2 

positions the curve of squared standard deviation could have those values 7 (n + 1) 
as its maxima within the range of observations. 

Let 21, 2... Zp -.. Lay, be the places of observations and 7, the mean of the 
observations at z,, the interpolation formula of Lagrange is then 


f (x — %) (% — La) Ree (% — Tp41) f 


(x5 SN 2) (ty vai @,) ...... (Ly ie Eni) ™ 


the sum taken over all the places of observation. 
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From this we find 


. f (@— 4%) ( * Ones (@ — nay) )? 
y= ap (+ 1) Sy nN cecceenee 22), 
Fe Me 1G ,= Gy = 8) reed) 
o2 

¥ 
and the (n + 1)st taking the value 1 as it ought to. If z, be the greatest of the 


z’s it is hence clear that for x > z,, since 


which for z= 21, 2... Zp, equals = (n + 1), the n terms of the sum being zero 





(w — @) (w@ — 2) ...... (% —Gns1) )? 1 
(x, — 2) (z, — De) s0s00- (ty — Znaa)} : 
o> Tn +1). 


The same applies to any x smaller than the smallest of the places of observation. 
2 
Therefore as we want o% to be = W (n + 1) at the ends of the range we have to place 


two of our groups of observations there. 


Let us take the half of the range within which it is possible to make observations as 
the unit of x so that the range goes from —1 to 1. 


(2) Hence for a linear function there is no choice left, the two groups of observa- 
tions must be at — 1 and 1. 


According to (22) we have 
_o 9 ((e¢+1)? | (x —1)*) 
1% = We} > aaa” 


or 10} =F -2{1— (1 — 2, 


which illustrate the well-known fact that by simple interpolation between two 
equally good values of a table, we obtain interpolated values with less probable error 
than those of the table. 


(3) Investigating a function of the second degree we have a third group to 
place besides the two at — 1 and 1, that is if we do not- beforehand suppose the 
distribution to be symmetrical. Let the third group be at a, then the interpolation 
gives , 

«See Yas + ® FasS 2 % + ef = Yo 
from which 


»_o , ([(w—1)(e—a)]? | [(x+1)(z—a)]? , fa? -1]? 
an$-a(fegn ese) + (eeats T+ Ba 








We want this to be a maximum for z=a, but (7) can only vanish for 
rma 


a= 0, in which case o- 


, 13 reduced to 


0, = 5 .3{1 —ga* (1 — 2%), 


v 




















KirsTInE SMITH 15 


which shows that we have succeeded in making o, a maximum at z=0 and 


2 
° ° ° e ° Co . 
obtained a standard deviation with the maximum value =, 3, as we desired. 


N 
(4) For a function of the third degree we find from four groups of observations 
at — 1, 1, a and y that 
(@—1)(@—a)(@—y)_ , @+1)(@-a)(e-»), 
—Sd+eady) * 2(l—a)(l1—y) 7 
(@—1)(@-y)_ , (@-1)(e-a), 
(F—I)(a—y)** OF—1)—e)™ 





~ fe 


a 








and, _o {{[e Des ae? jes Bienaeae? 
eee WU $44+00+n. | 4: 84-wG-e 
+ [coy + (ey. 
(1 — a®) (a - y) (l—y*)(a—y)] ) 
The condition (2) =0 

requires 3a? — 2ay —1=0, 

do* 
and (5 ‘) = () 

dz } ony 
requires 3y? — 2ay —1=0, 
from which is got a? = 7, 
and, since a 2 y, a? = y! = j, 


By introducing this value for a? and y? in o,, we find 
2 o2 ( ! © 52 ‘ P ) 
oo 8ti oo ee eae 
3Fy N { 94 (x t) ( x ) 
which has the required maxima at + +/}. 


(5) For the functions of higher degree we shall at once assume that the dis- 
tributions sought are symmetrical, since it is pretty clear from the symmetry of 
y and o,, with regard to the sought positions that it must be so. 

To determine a function of the fourth degree let us put groups of observations 
at + 1,+aand 0. The expression for o? can be written down at once and is such 
that the terms arising from the groups at +1 and —1 can be put together as well 
as the terms from + a and — a, then 


18 5 [Wet at)* 1 folat—aM)}* gfe 
iP ies [en Ot (ET a ee, 
2 ee: 2 
voc = 0. provides the condition : —— = or 
G2 I oun P a (1 — a?) 
r= Pe 7 
with which value the squared standard deviation becomes 
2 5 2 
10, = 51S at (at — pea — aah, 


which has the required characteristics. 
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(6) Adjusting by a function of the fifth degree six equally big groups of obser- 
vations at the arguments + 1, +a and + y the squared standard deviation of the 
adjusted y is 

2 2 — gt) (72 — ¥8)]2 A 1 4A) (ot 2 
Flacco | Ot) +s ages] ete 
Bac :— an 3s 
ASEM en, 
The condition for maximum at z= + a is 
9at — 5a? y? — 5a? + y? = 0, 
which together with the condition for maximum at z= + y 
9y* — 5a?y? — 5y* + a? = 0, 
since a? must be 2 y? results in 
a? + y= § and a%*= 2 
2 _T+2vV7 
y —. 
When these values are substituted in the expression above for o, this may by 
somewhat lengthy algebraic operations be brought into the form 
5%, = " .6 it i a O(a? — a®)? (x? — y®)? (1 - 2) , 

(7) For a function of the sixth degree the observations may be supposed to 
be at +1, +a, +y and 0. 

The expression for the squared standard deviation of an adjusted y becomes 

2 aa. ae Ss . aS (28 2 4 2 a2 . 712 
oe oa, W ki a. aw ) + ; Ez ome (a? + a?) 
»(72 — g2) (v2 — 2 a 2 a,2) ]2 
SSSIEMT eon SET eso 
A maximum at z= + a requires 
lla* — 702? — Ta? + 3y2 = 0, 
and a maximum at z= + y requires 
lly* — Ta*?y? — Ty? + 302 = 0, 
which added and subtracted provide 
11 (a? + y?)? — 36a?y? — 4 (a? + y?) = 0, 
and (a? — y?) {11 (a? + y?) — 10} = 0. 


Since we must have a* < y?, 





or 


(a? — 1) (7? — 1) 


a?+y2= 19 and a¥y?= 3, 
a?) 15 424/15 
or ees: Se 
The expression for o, may after rather laborious operations be brought into the 
form 2 3° .7.118 
= 7 {1 i 2 at — a8) (a? — 9298 (1 — ah 
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(8) It is thus, as we aimed at, shown for functions up to the sixth degree that 
by distributing the observations in (n + 1) equally big groups and choosing the places 
of these groups in one special way we can manage to keep the standard deviation of any 
adjusted y within the possible range of observations less than the standard deviation at 
the places of observation. There is every reason to believe that the rule holds for 
any degree of function, but as the general proof would be very complicated and as 
almost all practical cases will be covered by functions up to the sixth degree, the 
problem can therefore be left at this stage. 

As we have proved, any other distribution of observations leads to a curve of 
squared standard deviation that has a higher maximum value within the range. This 
special set of (n + 1) groups has therefore a very conspicuous advantage over all 
other distributions of observations. The application of it is however limited in that 
it demands that the degree of the function must be known beforehand and thus the obser- 
vations do not provide any justification for the form of function chosen. If however the 
function has been fully investigated beforehand and there is no doubt about its form, 
(n + 1) equally big groups of observations placed as indicated are the most desirable 
set of observations possible. The approximate values of the places of the groups 
are given in the table below. 


TABLE I. 
Degree of function Ist 2nd 3rd 4th 5th 6th 
( 1-0000 1-0000 1-0000 1-0000 1-0000 1-0000 
Places of } = 0000 4472 “6547 ‘7651 8302 
observation — aes ie -0000 +2852 -4689 


pea sis = = i -0000 
With rougher approximation the intervals between the observations, still 
expressed by the half range as unit, are as follows: 


Ist degree of function 2 

2nd an ss hae ; 
3rd ne a 

ae i443 

° ae ae Oe ae Be te 

6th » 99 : $2 So 8 


The six curves of standard deviation are represented in Diagram 1. It will be 
seen that the minima of a curve, if it has more than two, are the lower the 
greater their distances from the middle of the range, so that the variation of the 
standard deviation is greatest in the outermost intervals of the range. 


Il. Uniform continuous distribution of observations with constant standard 


deviation. General formulae. 


(1) As was pointed out in the last section the lumping up of observations in 
groups just necessary to determine the constants of the function in question has 
some drawbacks and cannot be recommended as a universal rule. In many cases 
it is through the observations themselves that we first get to know the form of the 


Biometrika XII 








‘sjurod ayugoep 48 s1oysnjo B1q Ayjenby ‘suoywiasg pivpuryg jo SeAIND *[ KVHOVIG 


“ WaxIg [a3 "99 % 
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eaisaq ys jo uoyoung “'gIp 
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function, and thus a full investigation may require more groups of observations 
than merely a number equal to the assumed number of constants in the formula. 
Besides, even when we believe we know on theoretical or other grounds before- 
hand the nature of the function a priori we may consider it prudent to distribute 
the observations so that they supply us with data whereby we may control our 
hypothesis that the assumed function is the right one. 


It is therefore desirable to find other forms of distributions which, at the same 
time as they make the standard deviation of the adjusted function vary little 
inside the range of observations, are more uniformly spread over this range. 


(2) A uniform continuous distribution at once recommends itself as the simplest 
assumption. As we suppose the observations to have constant standard deviations 
the elements of the determinants of (15) are the moment coefficients of the z’s at 
the places of observation. 


When the N observations are uniformly spread between z = — 1 and z= 1, 
1 
Bar =o and per4, = 9, 


and the expression for 4,0, is, according to (15), 























ie ee 1 pe 
n a a pe | a bs 
c= 1+— le te 
iis a Be | 1 2 | Pe Ba | 
. | Ps 
| Me Ma Ma Be 
ee ‘er WB, ERAS, Pep-2 | 
ee 7 ee Map | 
: ; | 
+ 2 a2P"? Pep Mopig +--+: Map—4 
He Pg oseees M2p-2 +2 Hg seesee Pap | 
Bi. 2g sere Pep | | Ba fre sees Pep+2 | 
| i ae | 
| 
| \ae-8 ae --->< Pap-6 | | Hap Pap+2 +>: Hap-2 | 
1 1 ee Pep-2 
a ps Hg cseeee Hep 
4 GP? Map Mepis ---+ Hap—2 sSeaeeere (23) 
1 Me incces Mep-2 | | 1 Me shcses Hep 
| 
Be Pa <xe2* Pep | Be ‘ips ESR Pep+e 
Man-2 Pap -:--:- Map-4 Pep Penta ----- Map 


By this formula we may evaluate successively ,6,, 0, ... 2%, When we know 
the two general terms of which the sum consists. 
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(3) The determinant of the order p, 





1 1 1 | 
| ‘2¢-—1 ED Caen aiae 2q + 2p—3 | 
‘hes Baphece 7 iat | 
sA= 2qg+1 294+3 7" 2g+2p—11], 
: : : | 
1 1 1 





| 9q+2p—3 2 +2p—1 
which includes the two types of the denominators in (23), shall first be evaluated. 


We find + ie SONOS 2 
pi ve Bg =I OMS = qT) Bg + 1 REF 8)’ 
and it shall be proved that if 
»A = {19}, 27°%...... (p — 2)2(p — 1)}?. 27 (7-») a Pe ae (24) 


q 
up to the order p, lt being the product of the elements of ,A, the rule holds 
for determinants of any order. 


It is clear that 


qd q+2 qd a a atl 
+144,1 = oA, p+t4o41, 041 = oA, o+141, »(— ie as pA 
qa @+2 
and 9414941, 941,11 ee p-14. 


If we therefore in the general relation for an orthosymmetrical determinant 


A — See Ave = Aww 


Agse's’ 
q 
put s=1 and s’=p+1 and A= ,,,A, we find 
4 q+2 atl 
q f — 
+i = 2-2 - ’ 
at+2 
p-14 


and, using (24), 
. " ‘ aq a+2 a 
ee ey pee (p — 2)? (p — 1)} 9(p-wp+2) gil. ,lI — , IT 


pt1> = sy p-2 Qp-3 _ — 3)3 (pm — 9)33 q+2 
1G dite tO (p — 3)? (p — 2)} ae 
Now, according to the definition of II, 
ft Sit 
pr, P= (2g — 1) (29 + 1)? (29 + 3)?...... (2g + 4p — 3)? (2q + 4p — 1) 
put ll pall ; 
x (2g + 2p — 1)2, 
Mie 
» q = (29 — 1)* (29+ 1)? (2g + 3)?...... (2g + 4p — 3)? (2q + 4p — 1)? 
oti lt" 
and 
q+2 
my i 


— = (2q —1) (2g + 1)? (2¢ + 3)?...... (2q + 4p — 3)? (2g + 4p — 1). 
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Hence 





qd 
oA = {17.2772 0... (p — 2)3(p — 1}. 20-2 wtm TT. [(2g + 2p — 1)? 
— (2g — 1) (2g + 4p — 1), 
qd 
pat A = (1? 29-2... (p—1)2. pit. 27D TT, 
which agrees with (24). 


q 
(4) Next we have to evaluate the minors of ,A necessary for calculating the 
q 
numerators in (23). For this purpose we only need the minors ,A, ,, but to carry 
qa 
through the proof by induction ,A,, for any values of s and r is needed. 


qg 
For ,A, , we directly find, 
q 





— 27.2 
ans (Og — I) q+ 1) (q+ 3) Oy +5) 
qa 92. 22 


and 





A = “Saar 
022 (Bq — 1) (2q + 8)* (2q + 7) 
these both agree with the following formula which will be proved by induction, 


qa qa 
oA,, “ae (- 1)°*°B, 101 : | ae (ir*. SPF cg: (p ox 3)? (p ae 2)}*. asl adhd oil,,. 


— qa 
By-1,s-1 is the binomial coefficient ™ EF and ,II,, the product of all the 


q 
elements of ,A, ,. 


The relation has to be proved first for r = s, then for r = p and finally for any 
combination s and r. 
For the first two proofs we use the relation between the minors of an ortho- 
symmetrical determinant 
A. Agee’ » Ages’ a” — Argy 9” 


= - Se ee eee e eee eeeeee 26). 
Ay» Ags gs" Agss' «” ar A, sia Ay, ss” ( ) 


This is found from two relations given by Professor Pearson* by dividing one 
of them by the other. 


q 
(5) Let A be ,,,A, s’ = 1 and s’’= p+ 1, then 


A q qd a 
= p+ 1Assi1 ° 9+14,, 8,p+1,9+1 n+ As, 9,1, 041 .-.(27). 
qa q qa qa qa 
+141, p41 94144, 1, 941, 941° 9+14s,2,1, 941+ 9+1Be+i, ott. 21° 941 OL La 041 
a q+2 
Now +t Asei1 a! oAs1,0-1> 
qd qa 
9114s, s, 941,941 ea Ass, 
‘a , qa+1 
+1 A4,1, 941, 941 = (— 1)?*? 4A), 5, 


qa q+1 
o+1Xs, 2,1, 941 = (= 1)? oMe-1,29 


* Biometrika, Vol. x1, pp. 232-3. 
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& . q+1 
p+19+1, p+i,3,1 — (— * oAg,s, 


@ qa+1 
9+141,1,5, 041 =(-—7?" oAs-1,1) 
so that all the determinants on the right side in (27) can be evaluated by (25). 
They all have the factor 


{19-2 4 Qp-3 eres (p ae 3)? (p ie 2)}2 , 2\p-l) (p—2) 
in common, when that is divided out there remains 
q : q+2 q a+1 
p+1 a aii (— 1)?B_4, 8—2° | s—1 (glI,_1,0-1- atlas ee gi l?_1, 0) (28). 


q q+i q+1 q+1 qvi tee y) 
D+1 Ai, o+1 Pet, 0-2 ° Bos 3 (pT 51,1 * p*+y,s ~— ptty,1 > “1.0 


Now indicating by C, the product of the elements of the rth column or rth 


q q 
row in ,,,A and by e,, the element of the ,,,A common for the rth row and sth 


column we find q q+2 C? 
p+ill 5 = p*4+s—1,s—1° 


2? 
1) «O51 
a a 2 

f= 8 ee 
p+i4+tss pitss- 2 ’ 

Cn+1, p+ * Spti,s 

1 

a OC o41 


qa 
polls, = p*!s—1,8° - . 
1, p41 + 0s, 1+ es, p41 


Hence the factor of the numerator in (28) is reduced to 


.. 6.4. 
2 “41° “s, +1 2 
o+i tli, C2.C2.. {11 -€p41, 041 — 41, p41}: 
be ntl 


For the ITI’s of the denominator we find 
q+1 C,.C, 


q 
p41), p41 = pits-1,1* > , 
53 -€s, p43 44,1 
I 
tl ie ct Coys . C, m 
9+1°*1,9+1 ~~ ° v DS* 6 e e ? 
“8, D+1* “p+1, p+1°* “1,8 


1 
il ce Fi ee 
p+t*hy, +1 = ptty1- 





1, p+1- 011 - p41, o+1 
q q+1 C? 

p+1 II, o11 ™ o**s-1,8°— = 

Ose+0 51 -Cs, p41 


the factor containing II’s of the denominator of (28) is therefore equal to 





g G78 e e 
2 1s “p+1,8°* “1,1 ° Yp+1, +1 
p+1 Ij, p+1 Gs C : C ih {e1, * €p41,8 — 1, n41- Css} 
7 p+1° x 


Introducing these two expressions in (28) and substituting for the one factor 





q 
II oS Ae e 
2t1—" the value —1-.?*1, “*_ we hence find 
II 8 1, p41 
p+1**1, p+1 | 1 
q Pes. gee i a 
2 
94 Ass = (—1)°8 Bp Ci pty 11 - Opt, p41 pills 
qa oe p—1, 8—2 YP p—1, s—1 1 1 . q 
+1 Ay, p41 = _ a ve 


Css-y, 941 1, 8-P p41, 
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The fraction containing e’s equals 








(29+ 2p—1)®—(2g—1)(2g+4p—1) Pp 
(2q + 4s — 5) (2q + 2p — 1) — (2q + 2e — 3) (2g + 28+ 2p— 3) (@—1)(p—s+)’ 
a a 
Bn i® 
hence x an = (— 1)?B2 4 Ee ie 
pa 9+1~“1, p+1 p+1 1, p+1 
Qa q+1 qd 
94144, 941=(—1)? ,A = (— 1)? {1?-1. 29-2... (p — 2)8 (p — 1)P. 29-9 Ty on, 
we therefore find 
q a 
941 Ass = BF, 1 {19 . 27-*....... (p — 2)? (p — 1)}*. 299», , 


agreeing with (25). 


q 
(6) To evaluate os: we shall in (26) put A=,,,A, s=1, s’=s and 
s’’=p-+1. Reversing the fractions we then get 


g @ a qg gq 
p+1 A, p+1 _ +1 A, 8, D+1, D+1° D+1 A,, 1, 8, D+1 + p+1 Ags, p+1,1, 8° oti As, 81, p+1 





q q q q 
D+1 An p+1 A, 1,8,8° p+1 A,, 1, +1, p+1 ~~ ~+I Aj, 1, 8, p+1 
scGeeseseees (29). 
q q 
As +1 4¢,s, 941, 041 = o Ass, 
qa q+2 q+1 
p+1 A,, 1, 8, p+1 = pAs-1, are (— ee pean; 
q - qth 
p+1 Asst, p+1,1,3 — (— 1) ihe, p? 
@ q+1 
p+1 A, 8,1, p+1 — (— TL)? Me-1, 2: 
1 q+2 
p+1 A,, tae ghe-s, 8-1? 
a q 
p+1 Ay, 1, 9+1, p+1 = pAt1> 
the right side of (29) can be evaluated by (25). 
We thus get 
qa q+ q+1 qd q+2 
p+1 A,, p+1__ (— idle Bo-1, s—-1° By-1, 8—z ° By-1, s—1 (—, 3, 8 . gil, at sil,, : pl s_1, ») 
Ce ae iy ee q+2 q q+1 
p+1 Ani Bi -1, s—2 (tts, s—] ° pli, es pili, s-1) (30) 


q 
We want here to express the II’s of the numerator by ,,,II,,,, and those of 


qg ° ° 
the denominator by ,,,11,,, and we find the following relations 
i aH C,.C, 
2.01 O42 61°. ho Ue 
hi Orgs 0s, 042 -Sre 
i a T Gee 
p+1 8,D+1~™”° + p**sp* e e e ’ 
1, p+1* “+1, p+1°“1,8 
. 4 A oe 
9+1°**3, 941° ° &#2=32€f® *ss ’ 
€s, o+1 + Css» p41, +1 
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fi q+2 C,.C,; 
pt+14+s,p+1 = p*ts-1,9> > ae 
€1,1 + 18+ 1, pti 
q q+2 C..0 
| Ae 8 
and oll, == lI - ua 
oe ° is 
i on tl Chat 
1-H = sh > a 
C41, pt1 * 1, p41 
q q+1 C..0 
eo 3+ U p+ 
onlly, = lly. —o_C 


Cs, p+1 > &1, +1 * M18 


Substituting the II’s found from these relations into (30) and eliminating the 
q 


II 
one factor ?**—*?** by 





p+1 iy q 
ptll,, p+1 _ Ci _ es, p44 
. oe 
i Cy-Cos C11 
p+itty 
we get 1 1 - 
es 122 Ee a ae i 
ottAs, 941 Ps ( i Bo -1, s—1 ©, 3-01, p31 €1,.1 + &s, p41 pti ll, p+1 
_— = e . > 
q ae ae fi 
p+1™11 2 Vip es p+1 1,1 
p a C3, p+1 Csg-O p11, p41 
or introducing the values of the e’s 
q 
v+tAs, o+1 
q 
+144, 1 


q 
(—1)*+9+48) 4.s-1 (29-+28—3)(2g+2p—1)—(2q—1)(2g+2p+28—3) pyle. os 
Po-1,s-2 "(2g + 2p+ 2s — 3)?— (29+ 4s —5)( 


2q+4p-—1 a 
eo ee 
q 
p+1441,1 
Now 
q q+2 ; | , 
941411 = pA = {19-! . 29-*...... (p — 2)? (p — 1)}*.27"—) Th, 

and hence 

A q 
pris, p+1 — (— are 8 as ase , Je-8 pieci (p — 2)2 (p — 1)}?. 9p (p—1) pial. 5 


in agreement with (25). 


a 2 : q 
(7) It now remains to prove that (25) holds for ,,,A,, when both s and r 
are different from 1 and p+ 1, and r different from s. 


For this shall be used the relation 
A ° A ses’ ee A, ° Ag 5” a Ags . Ags” 


between an orthosymmetrical determinant and its minors. 


Putting A= ,,,A, 3= p+ 1, s’ =r and s” =s and solving the equation with 
regard to ,,,A,, we have 
1 
pt Ars = : BEE (ner4 - o4pAgs1, ptt.0,8 + tt Apit.r- 9414p41,0) 


p+1“p+l, D+1 
where oexibnts bees A 


p rs° 
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Evaluating this by (24) and (25) we get 


qg (-— 1)r+s : ‘ . 
p+t4s,5= ee CP EOC cncus (p — 2)?(p—1)}®. 27 7-» 


Pp 
q qa qa q 
x [4p*B8,_1, s—1° By-1, r—1° pti ll =: Ape + Bo, 8-1: B,, T—-1* p+1 | ree ° arey | ey s]---(31). 


qg i] 








C41, pt+1 + Or, pt+1 + &s, p41 
But oll,,.= pu ll,,s > > 
pt+1 
il i i C, er, p+1 
p+1**p+1,7 ~ +1 ne C ‘<a. = 2 
p+1 e,, 8 
qa q (2 
= p+ 
and 94 tl = gil ‘ ‘ 
D+1, P+1 
Y 
il an Cou span 
pt+l p+1,3s~ p C z 4 
s p41, p41 


Substituting these values in (31) we find 


q . q 
ore, = (—1)°4*,19-* . 29-4... (p — 2)*(p — 1)P.29 9) Bes os Bo-s0-1-P*- stall ee 


1 
$+ oe 
” Me eee Tee ee 
1 1 AS 


e 








“e 
e T, D+1 8, D+1 
and as the last fraction. equals 


1 
(p—s+1)(p—r+1)’ 
q q 
pian e= (— Bp, o-1-By,ra (1-2. 29-2... (p — 2)8(p— 1)}*. 2°" TT, ,, 
with which the proof by induction for (25) is carried through. 





(8) We shall now return to (23). It consists of 2p + 1 terms of which the 
(2r + 1)st originally was found as (9,02 — »,_,0%) so that 











gi, “rey me 
2r—1 
1 
ga Sli eee ee 
$ $ 2r+1 
1 1 
2r 
Pe es * Stl +S "1 
arvy Qr-1%y N 1 ‘ é | 1 . 1 
By goes aes Bay ees orl 
1 
a. li ee pong 4 0) i u 
| 3 t or+1 | =z } owedee or +3 | 
| at 3 
| ese: 1 A Ram. 3% 1 1 
| 2r—1 .24+1°°°" 4r—3 | | 2r +1 2r+3°°°" 4r4 1 | 
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and 1 & : 
1 4 Py eoocce or —1 
1 
. : 
Su Pa i oases wr 
‘ 
: 
g2r-2 ome: | 1 
ae a= 2r+1 2¢+3°°°" 4r —3 
2r-1"y 2r-2¥y N l i y } 1 | | : a Se 
| eeecee Get 3 $ eeeeee or +1 
a 1 
= ghecwe - i Be  Seeaee — 
é 2r+1| | $ bj 2r +3 
Bo ae} ie he ee 
| 2r—1 24+1°°°" 4r—6 || 2r+1 2+3°°°° dr —1 





With the notations later adopted we therefore find 


"3s A 2 
28 
[a . 4 Ages, rts 
o? 0 e 


ee 





2 . 
arFy — ar—-1 Fy = N 


1 1 
- 
s=r—-1 2 2 
2, 
2 2 S [x me Asis, rl 
and J 2_ OX" | s=0 
2r-1%y — ar-2Fy = ay a 
or ye 


1 


Substituting the values for A’s from (24) and (25) we get 
| 8 2s rt p41, 744 
1 (— 1)*8,, 2 — = an 


| 
——— 
—>S—S/ 
tw 


P . o? s=r 
a9} — a1 = 9a 





1 
his | ,Il . ryt ll 
and ne 2 i 
~ = =o ~ 8 r**s+17 
ee ey tna S| (— 1)°B,-1,52* a | ' 
I s=0 
ids L ee ee 
or, as 1 
1 
os ee te) ey th — = V4r +1. (28 + 1) (2¢ + 8) ...... (28 + 27 — 1) 
Nai * r+1 II orn“ Cr+i, r+ 
and 
: et ettet = , Cran — = V4r —1. (28 + 3) (28 + 5)...... (2s + 2r — 1)*, 
J alt ee ne 
. o%(4r +1) (827 : 
210% — 271% = ( Ad) "5" [(— 1)°B,, «a (28 + 1) (28 +3) ...... (28 + Ir — 1} 
(|r)? . 2?” (s=0 


* The e’s and C do not of course have the same value in the two equations as they represent columns 
and elements in two different determinants. 
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and 


2 (4r — 1) g2 (s=r-1 
a8 — p46 = 1 8 i—1P Ro” Ge + 8) (0 + B...... 
2r-1"%y 2r—2"y ee | me Pr-1,s 
: 2 
(2s + 2p — 1)} aon eea. (33), 
2 
which enables us to form ,,o3 by successive summations from 90? = a 


Before investigating the curve for ,o° for a special we shall first look at ,,o° 
for =0 and z=+1. 
(9) From (33) we see that when z = 0 
er-1 Fy a ar-2Fy 
2%, is for x = 0 most easily evaluated from the formula (13). 


P j 1 , 
Remembering that in our case m., = a1 We find from this 
1 
#8 o* Ani 
se": ~ a 9 
oiA 


and hence by (24) and (25) 





z=0 z=0 2 5 9 2 
aet128 = 299% = Fy {5 4 4 Bi = *} Fee (34) 
(10) To evaluate ,o; for x= + 1 we use (32) and (33). The sum in (32) may 
be considered as dld 1 d {x2"-} (a2 — 1)"} 
* §6@2a:: x dz 


with a number r of differentiations. If these operations are undertaken directly 
upon x?"-1 (7? — 1)" the result is 
a, (x? — 1)" + a,_, (2? — 1)? 4+ ...... a, (2? — 1) + ap, 
of which only Qo = 2r (2r — 2)...... 4.2= |r.2° 
remains for z= + 1. 
Corresponding to this the sum in (33) comes out from 
es did d 1d {x*-! (a? — 1)"-3} 
eee dx x dx 
by taking (r — 1) differentiations and therefore 
Sor, equals, for r=+1, (2r—2)(2r—4)...... 4.2=|r—1.27-. 


2 





z=1 2=1 


2 7 ce 
Hence ar%y — 27-19) = (4r + 1) 
z=1 z=1 g2 
2 Ie 
and ar—1F%y — gr-2Fy = N (4r % 1), 
2 
: P Co 
or since oo, = nN? 


7 
y= wil t3+5+ eevee 
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(11) In Section I under (7) it was found that 


[eo ,o2dz = o2(n +1) 
when the integration was taken over the places of observation. For the present 
distribution f(z) is 1, % (x) constant and fy (x) dx = N, hence the mean of ,o% in 
the range of observations is for a uniform continuous distribution 


2 
y (n+ 2D. 


For the grouped observations in Section II we find by integration of the 
formulae for functions from the first to the sixth degree that 


1 f} o 1 
5] side =F (n+) (1-4). 


IV. Uniform continuous distribution of observations with constant standard 
deviation. Special formulae. 
(1) Let ,o; — ,,0% be indicated by S,,, then the formulae (32) and (33) 


ive us 2 
6 8, = Fy 32% | 
| 
25 
Sp = 55g (1 — 30%)? | 
P | 
S; = 8 a (3 — 5a*)? | 


f N : er te eee | (36), 


o 9 


a a | ee 2 Bar4)2 
S.= 5+ gq (3 — 302% + 3524) | 
o lz ‘ " 
Ss= W- Gq (15 — 100 + 6324)? | 
o 13 
S,= N° 9x 956 (15 — 3152? + 94524 — 6932)? } 
from which we form ,o? beginning with 
- 
0%, = N | 
2 
105 = Fy (1 + 82%) 
27 (14 8024 (1 — 32*)?) = A 91 — Qe + 5 Bart), 
2, = N , N’4 
and further in the same way 
2 
3, = _ . ; (9 + 4522 — 16524 + 1752) » (37). 
oo 
49, 7 . =f (9 — 36x? + 29424 — 64426 + 44178) 
2 
5%, = y j i (25.+ 1752 — 175024 + 6510x° — 95552°+ 48512") 
= or. saz (175 — 10502? + 1732524 — 93660z° + 2252252 + 





— 2453222! + 9909921?) 








— 








—————EEE 
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(2) Since ,,o7 = ,-,03 +S, the curve for ,o? is entirely above the ,_,0% curve 
except where S, = 0. 


Solving the equations S,, = 0 the following roots are found : 


For 8, =0 r=0 
» S,=0 a= +V4= + -5773 
» S,=0 z=0 w=+V2= 4 ‘7746 
15 x he 8611 
» 5=0 Jan/ SS ii . -3400 
35 + 2V'70 -9030 
a =e One ee a tase 
-2386 
se Se = 0 e=+ -6612 
-9325 


Since all the roots are rational and all lie between — 1 and + 1, ,03 therefore 
equals ,,_,0% for n values of z all of which are inside the range of the observations. 

The adjusted values of the functions at these abscissae appear to be of special 
interest since they are uncorrelated as was shown in Section I under (9). 

(3) Looking at Diagram 2, wea the curves of ,,o, up to n = 6, it is seen, 


=) = 
as was also clear from the formula for ‘a and o% given in the last section, that 


while the standard deviation in the middle of the: range increases slowly with the 
degree of function it increases very rapidly at the ends of the range. At «= 0 
the curve has a minimum when the degree of function is odd and a maximum when 
it is even. Besides that the curve has (2m — 2) maxima and minima between 
—land1. As the curve for ,,o° is of the 2nth degree, ,o7 is therefore increasing 
for x increasing above 1 or for x decreasing below — 1. 


The abscissae of the maxima and minima are given in the following table. 
Degree of 


function Abscissae of maxima Abscissae of minima 
1 0 
2 0 +V}= + -4472 
” 0 
3 +V}= + -4472 levin + +6547 
. f 0 anf an - 
(+ V3 = + -6547 2852 
7 °765 6 i ¢ 
‘ , as (, /iea avi _ $8302 
21 +2852 |-4689 


8718 
5917 
2093 


—_— 


( 0 
15 + 2V/15 . 
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Hence the curve for ,,,,0; has a maximum for the abscissae at which 2,0; has 
a minimum. A comparison with the results in Section IT shows that the abscissae 
of the maxima found here are the same as those of the best places of observation 
for (n + 1) equally big groups of observations of a function of the nth degree. 
These places tally with the places where ,o; was a maximum. Thus if we 
imagine that we had started the investigations with a uniform distribution of 
observations, and to lower the maxima of the curve of standard deviation had put 
clusters of observations at those maxima and at the ends of the range we should not 
get the best curve of standard deviation till all the observations of the continuous 
distribution had been distributed at the n — 1 places of maxima and at 1 and — 1. 


The minima of the standard deviations obtained from a uniform continuous 
distribution and the (n + 1) best groups of observations do not fall at the same 
abscissae. 


(4) The curves are very far from our ideal of a constant standard deviation 
throughout the range. To obtain the same maximum of standard deviation as 
(n + 1) groups could give us we should have to limit the part of the range used to 
the following fractions of the range: 


for Ist degree +58 
~~ ton “73 
ee: eee -80 
ae « “84 
— 83 
“73 


It is not likely that the range of values of the function which we investigate 
would only be of interest inside a range so much smaller than that within which 
we might actually observe; further it seems likely that observations all of which 
were taken inside the smaller part of the range would give better information for 
that special interval. I shall therefore examine in the following sections if a uniform 
distribution of observations to which is added clusters of observations at the ends 
of the range will not possibly give a more satisfactory curve of standard deviations. 


V. Uniform continuous distribution of observations with additional observations 
clustered at the ends of the range; constant standard deviation of observations. 
General formulae. 


(1) Suppose we have NV. ri a observations uniformly distributed from — 1 





to 1 and besides = ‘T = : observations at — 1 and the same number at 1. We 


\ 


then have L(ft Na*® Na ) 
| 4 (1 + a) Tia 


Her = N \ 




















Choice in the Distribution of Observations 























1 1 
23 Mar = aaa 8 
and Hers = 9. 
According to (13) and (14) we find, 
0 1 Ae Se F vad 
1 1+ 4+ = 
a Mi ah oeace p+ a 
x2 1 + 1 1 
5 a HHA —eeeeee +3 % 
ate e E; 4 in 1 
,_ 0? (1 +a) ( io4i'* +3 GB ieccsas ip +i a 
2nFy = N > | 1 | 
1 ett 
| l+a FC ae p+it?| 
lig 2. ae pt Sagres 
| ’ 2p+3 
: i 
i= Peerage oe oss 
FS ie 2p +3 @ coccce 4p +1 Qa | 
0 1 a See g2p—-2 
1 loa er a +a 
‘ 2p+1 
a? kta eS are pds obitg 
; 2p+3 
1 1 1 
2p—2 it pe Cais OMe 
H Ip+it* In+3t* views 6 de, 











t+a 





) 
pes (38) 
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and 
Li 1 Waren been 727-2 
| 1 1+ : : +a 
a 3 +a ecccce 2p —1 
x? ta a decane i+ +a 
7 2p+1 
1 1 1 
2p-2 , 
ee ee | uP ei Bd Ip+1'* emens reget Haag 
2p-19y = N ‘ 1 
| l+a oe a S-it** 
| t+a See ie : +a | 
2p+1 | 
1 1 
ae oe Bil coccce 4 ay ated 
|: & 1 ee Rec aee z2p-2 
| 
ae t+a eee ———__— + @ 
| . 5 2p+1 
| 1 
| 2 t+a THE “este est? 
| 
Pie Ripieee err fe 
he at’ See oo 4p —1 
+ 2 i 
1 1 i ee 
3 +a st+a ive iszt™* 
i +t. 
+a TA — severe Ip +3 a 
: a ice gc La +a 
| 2941 a 2p +3 eoeves 4p — 1 





Biometrika x11 


see eeeeeeeee 
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and, according to I under (6), 


2 


2 ioe ap 
2p%y — 2p-19y = N (1 + a) A 
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Observutions 









































of pth order 


1 2 
‘ 1 ar. 2 
1 1 + a 3 Sa a ee 2p on 1 +a 
| x? lig i. ae . +a 
| " 2p+1 
1 
| 1 l 
| x s +a do sl ae eee op +3 +e ) 
| : | 
2p pet = 1 +a = ms +a 
Sy 2p+1 SE Meets 4p —1 | 
1 ] 1 | 
Es ae 2 — 1 = 
l+a FR | usw; iit?) l+a i a Bla 
$+ $ : I } : +a 
a eo Iti’ |. gta BAA cavers Ip +3 
ct See See Sa Fats a TE | 
2p —1 Ip +1 varkee ip—3 a | Gri" +3 Gcicacs ip+1 | 
lacs (40) 
and 
, “ite. 
27419" — 29% WN a x 
| 1 $+a Ry ery 
| 3 — 5 G@  <secses 2p + 1 a 
| l 
| x t+a re +3 +a 
| gf t+a oe ere ie J +a 
| i 2p +5 | 
| v2? = 3 l + 1 | | 
| 2p L 3 2p + 5 eoscce 4p + | 
t+a BiG . edenes — +@ | tig t+a eee 
3 5 2p + 1 ! See se ee. eeeeree 2p + 3 | 
i 1 | 
1 | = - 4 1 : = | 
+a et Merry op +3 eee sta FAG seve +5 
: . 3 | | 
i. PAA Oe Wary ee 
eel? eee o—1"" Terk © ees ree 
Keecaesh bail (41) 
(2) For the reduction of these formulae we have to evaluate the determinant 
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1 1 
i" %+1+* woes 
g sie bi ; + 
J. +1 2q+3 i gic 
Bie sits 1 
| @+2p—8°* I+ap—-1*?*" 


35 
: + 
"9a 4 2p —3 a 
1 
“I+ 2p—17° 
= 1 oe 
*ig+ip—5** 





By subtracting from the elements of each row the elements of the proceeding 


and leaving the first row as it is, it is transformed to 
qd 
A) = (— ii x 


] 
ey ie 

2 2 
(2g — 1) (q+ 1) (2q + 1) (2q + 8) 








2 2 
(2q + 2p — 5) (2qg+2p—3) (2g + 2p — 3) (2g + 2p — 1) 





1 


eeneee 





eeeees 


| 
| 
| 
| 
| 
| 
| 
| 


(2q + 2p -- 3) (2q + 2p — 1) 
— | OTR ee 
(2q + 4p — 7) (2q + 40 — 5) 


seeeee 


which when the columns undergo the same process takes the form 











(2g + 1) (2q + 3) 





(2g + 1) (2q + 3) (2q + 5) (2 


2 2.4 


1 a. 2 2 
ay—1*° (2g — 1) @q + 1) (Qg+1)Q@q+3) ~"@g+2p—5) y+ 2p—8) 
2 2.4 nN 2.4 2.4 
7-1) @q+1) (@g—1)@q+ 1) Qq+ 8) (2g +1) q+ 3) q+ 5) q+2p—5)...2g+2p—l) 
2 2.4 2.4 2.4 


(2g + 8) (2g + 5) (2g + 7) "(29+2p—3)...(2g+2p+1) 


2.4 2 





= 
Sie 





Let us introduce the notation 








4 
| (2q+2p—5)(2q+2p—3) (2g+2p—5)...(2g-+2p—1) (2g+2p—3)...(2g+2p+1)"”(2qg+4p—9)...(2g—4p—5) 


1 1 1 
(2q—1)(2g+1)(2g+3) = (2g + 1) (2g + 3)(Qq+5) ~~ (2g + 2p—3)... (2g + 2p +1) 
1 1 1 





(2g + 1) (2g + 3) (2¢ + 5) 


] 1 


(2q + 3) (2q + 5) (2g + 7) 





"(2g + 2p — 1)... (2¢+ 2p 4+ 3)]. 


1 











q 
Then, since for a = 0 ,5 equals the determinant , 


qd q Kiet qd 
po = pAta. 23” * p-l1 


qa 
and the problem is reduced to the evaluation of ,D. 


(2q + 2p — 3)... (2g +2p+1) (2¢g+2p—1)... (2g + 2p+ 3) (2g + 4p —5)... (2g + 4p — 1) 


q 
A, we have 
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(3) It shall be proved by induction that 


7 - 
yD | 
- {1?, 20-1... (p -1)?. p}®. 20-2) (p + 1) 

(2q — 1) (2q + 1)*(2q +3)*... (2g + 2p — 5)?“ (2g + 2p — 3)P(2q + Ap — 1)” (2g + 2p +1)” (2q + 2p +s3)P... (2g + 4p — B)?(2y + 4p — 1) 








It contains the 2p + 1 different factors of the elements with indices increasing 
from 1 at the extreme to p in the middle so that the three factors of which the one 
diagonal line of the determinant consists occur with the index p. 


For p = 1 the formula gives 
b 1 
"(2g = 1) (2 + 1) (2q + 3) 





as it ought to. 


As the determinant is orthosymmetrical the relation 


— 2 , 
ee Ay, - Ay Ais holds. 


sss’! 


A 


q 
Applied on ,,,D for s = 1 and s’ = p+ 1 it may be written 
»_ 0b. cb 
we 
p-1 
Looking first at the numerator of (43) we see that it has the same value for the 
two terms of the numerator of (44), and divided by the corresponding factor of 


a+2, 
»-1D it becomes 
Fj - 2819-9 ,..... (p — 2)*(p — 1)4 p* *(p + 1)? 22 (p-2)—(p—1) (p—3) 
[1 Q°-2....., (p — 2)? (p — 1) > : 
reeeee (p — 2)*(p— 1)? p? (p+ art 27-8, 





= {]P+l 9p 
{1?+1 2 


q , 

To evaluate the factor in ,,,D arising from the denominator of (43) we shall 
give a table of the indices with which the different factors occur in the D’s and 
their ratios. 


2q-12g+1 2q+3 ... 2g+2p-5 2q+2p-3 2%q+2%p-12q+2p+1 2q+2p+3 A%+2p+5 W+2p+7 ... 2q+4p—-1 2q+4p+1 2+4p+3 


> | a See Pe | p p p p-l p-2 p-3 «.. 1 _— _— 
+2 
eet — — 1... p-3 p-2 p-l p p p O-1 sco 3 2 1 
qt2 
p-D — —-— | w o-8 p-2 p-1l p-1 p-1l p-2 p-3 «. 1 _— _ 
4 


2(p- 1) 2(p- 2)... 


~ 


+3 
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Hence the factor arising from the denominator of (43) is 
(2q + 2p ~ 1) (2g + 2p +3) — (2g - 1) (2g +4p +3) 








(2q -1) (2q +1)? ... (2g + 2p — 3)” (2g + 2p — 1)? (2g + 2p + 1)?*1 (2g + 2p +3)”*! (2g + 2p+5)”... (2g +4p+1)*(2g44p+3)* 
The numerator of this equals 4p (p + 2), 
multiplying with the factor previously found we therefore get 


ab 

; Pie {141 2 ... (p—1)3 p? (p+ 1)}?. 2042) 2). (p42) 

(2q — 1)(2g+1)*... (2q+2p—3)? (2g + 2p — 1)?** (2g + 2p + 1)P*1 (2q + Dp +3) (2g + 2p +5)? ... (2g + 4p+ 1)*(2g+4pF3)’ 
which is what we wanted to prove. 


(4) When the values of A and D are introduced in (42) we get 
2 {1-2 27-2... (p—2)8 (p—1)}2. 27(7-2) 
(2g —1) (2g +1) ... (2q¢ + 2p —5)P-? (2g + 2p — 3)” (2g + 2p — 1}? ... (29 + 4p — 7)? (2g + 4p —5) 
@ . 22-) x 








s {1P-1 , 27-3... (p — 2)? (p—1)}?. QP-NO-8) . p 
1-1) (2q +1)? ... (2g +2p —7)P-® (2q + 2p — 5)”} (2g + 2p —3)”-2 (2g + 2p —1)P-2 (2g + 2p +1)?-2... (2g +4p —7)? (2g +4p -5) 
‘.. (ie . 9 ... ip - 2-11". Pe cept 8 

” (2g — 1) (2g +1)? ... (2g + 2p — 5)-4 (2g + 2p — 3)? (2g + 2p —1)P-2 ... (2g +4p —7)* (2g +4p —5) ; 


The denominators of the formulae (38)—(41) for ,,c% are now known since they 
2 








only consist of the factors 3 and,0. To beable to write down the general expression 
for ,0;, we should have to evaluate the minors of 5, but their form is so complicated 
that a direct calculation of the determinants for the degrees of function in question 
appears to be simpler. With the material in hand we are however able to deter- 
mine ,o3 for z= 0 and z*= 1. 

(5) From (38) and (39) we see that 


z=0 z=0 o 3 : , ; . 
29% = 991% = N > (1+), and with the 8’s as given by (45) 
pti) 
c=0 z=0 
0} = 99419, = 
o*(1+a)[1+ap(2p+3)]1 . 3%. 5%. 74. 98... (2p - 1)” (2p +1)P** (2p +3)” (Qp+5)P?... (4p-1)*(4p+]) 
NU1.2.3... p}?. 2” .[1+a(p+l) (Qp+1)]5. 7269... (2p — 1)’-* (2p + 1)? (2p +3)” (2p +5)... (4p - 28 (4p 1) 
_ o8(1 +a) 38. 52... (2p — 1)? (2p + 1)?. [1 + ap (2p + 3)] 
~ —~W{l.2.3... p}®. 2”. [1+a(pt 1) (2p+ 1)] 
xt _ arp ot (8 5 Qp—1 B+ l}*(1 +a) [1+ ap2p+3)] 
ay wt N24 2p— 2" Bp J [L+a(p+ 1) (Qp+))) 
a=] 


(6) To ffnd ,o? we have to evaluate the determinant of (p + 1)st order, 














| ® 1 Midian. = ateekies 1 
1 1 1 
1 S-i** ag 1* eevces Sgt 2p-3 °° 
2q+1 ¥ te... 2q + 2p—1 
a reer Mane 1 
og+2p—3 Oe OF CG no neeee 














38 Choice in the Distribution of Observations 


Treating it as 3 was treated under (2) of this section, except that now two 
rows or columns are left unaltered, it takes the form 














1 0 shy aint shag 0 
Ps ee a 2 2 ; 2 
2q-1 (2g - 1) (2q¢ +1) (2g+1)(2g+3) 0 (2q + 2p — 5) (29 + 2p -3) 
= 2.4 2.4 2.4 
(2g - 1) (2q¢ +1) (2g — 1) (2q +1) (29 +3) (2q +1) (2g +3) (2g+5) (2g + 2p —5) ... (2g+2p -1) 
es 2.4 2.4 2.4 ae 
(2q +1) (2g +3) (2g +1) (2q¢ +3) (2q¢ +5) (2q +3) (2g +5) (2g+7) °°" (2q+2p —3) ... (2g+2p+1) 
2 2.4 2.4 2.4 





(2q+2p-5)(2g+2p-3) (2¢+2p-—5)...(2g+2p-1) (2g+2p-3)...(2g+2pt1) (2q +4p —9) ... (29 +4p -5) 
qg 
= — 23\p 4 sD. 


Hence we find from (38), 


z= 


1 2 
2 3p 3(p- 1) 

ep7s = _ (1 as a) ‘i i +. 2° pl , 
pt yo 





Now from (43) and (45) we get 


’ 4 

2? oD _ __(p + 1) (29 + 2p — 1) wn 
a [1+a(p+ 1) (2q+2p—1)] PPTTTTITTTiTe Te > 
pri 





(p+ 1) (2p + 1) _ p(2p +4) 
+a(p+1)(2p+1) 1+ap(2p+ 1) 


fete eet 
+a(p+1)(2p+1)  1+ap(2p+1) 


n= (+4); 


z=1 


or 290% = F (1 +a) (2p + 1) {i ...(48). 


In the same way we get from (39), 
1 2 
z=1 2 3(p-1 3(p-1 
Jaq) eB , BH Dy 


20-19) = “a 3 


~ 3 


which by the relation between »D and - just found is reduced to 


yen a p (2p — 1) p (2p + 1) 
29-10 = N (1 +a) reaches T+ | auceecesl (49) 
Both (48) and (49) are covered by the formula 
—. o* ‘ n eee: 4. 2 ry : 


(7) The evaluation of ,,o3 for special values of n can be made easier by a trans- 
formation of the determinant 











| 
. 
) 
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| 1 ts = oe + : + | 
; 2q—1 a 2q+1 Di = eves 2q + 2p—3 a 
| 1 1 1 | 
2 re Rebs. Se Sica = 
| x Q9g+1'° +3t* sosate 9g + 2p—1**| 
) eka 1 | AS Pie bay AEE 
x Q9g+3'* dg+5 1° suet Qgt2p+1 | 
1) : : | 
fe ge i og Se ge ee oe 
1) 2qg+ 29-1 2q +2p+1 2qg+4p—3 | 
Leaving the first row unaltered and subtracting from each of the others the 
: proceeding we get a determinant the first column of which is 
5) 


1, 2?7—1, 2 (a? — 1)... 2??-2 (a? — 1), 
while the other columns are identical with those of the determinant 5 previously 
treated in the same way. When next the two first rows are left as they are and 
from each of the others is subtracted the proceeding one the result is 


aba (tx 











1 oe Eins = aime, See 
2q —1 2q+1 2q + 2p—3 
1-2? : _ nen eters SSS 
(2g — 1) (2q + 1) (2g + 1) (2q + 3) (2q + 2p — 3) (2g + 2p ~ 1) 
(1 — 28)? _*, Saas 3.4. 2.4 


(2g — 1) (29+ 1)(2g+3) (2g +1) (29 + 3) (29 + 5) “(29+ 2p—83)... (2g +2p+ 1) 


a2p-4(] — 72)? 








2.4 2.4 2.4 ba 
(2g + 2p—5)...(2g+2p—1) (2g+2p—83)...(2g+2p+ 1) (2g+4p—7)...(2qg+4p—3) 
Leaving now three rows unaltered, next time four and so on, it is clear that we 
shall at last after p of these sets of operations get 
p(p+1) 
pud=(-1) ? x 











1 1 1 
' I Bad: esi°* og+2p—3* 
| 1—at ee ns: eee . spd alii 
i (2g — 1) q+ 1) (Qg+1)Qq+3) ~~ (g+2p—3) q+ 2p—]) 
\(1—at® 2.4 - ie cogs: th eas 2.4 
(2g — 1) (29+ 1)(2g+3) (29 +1) (29 + 3) (29 +5) © 2g +2p—3)...... 2g+2p +1) 
(1—22)» 2.4... 2p 2.4...2p S 2.4... 2p 


(2qg—})...... (2g+2p—1) (2q¢+1)...... (2g+2p+1)  (2qg+2p—3)...... (2qg+4p —3) 


By treating the columns in the same way, leaving first two then three and so 
on unaltered, we find after the first set of operations 
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BO I a ae 
1 2 2 
. 1 ibe @q—-N@q+1) “q+ 2p—5) q+ 2p—3) 
te 2 2.4 2.4 
@q—1) @7+1) @q—1)@¢+ q+) ~~ Qg+2p—B)...(2q+2p—1) 
a 2.4.6 ‘2: ae 
(2q — 1) (2g + 1) (2¢ + 3) (2g —1)(2g+1)(2q+ 3)(2q+ 5) “° (2g+ 2p—5)...(2g+ 2p4+1) 
ase 2.4/..2p 2.4...2p(2p+2) 2.4... 2p (2p + 2) 














(2g—1)(2g+1)...(2g+ 2p—1) (2g—1)(2q+1)...(2g+ 2p+1) © (2+ 2p—5)...(2q+ 4p— 3) 
and after (p — 1) sets of operations 


q 
pid = (— 1)” x 














: Oe 2 2.4... (2p — 2) 
2q—1 (2q— 1) (2q+ 1) ** @q—1)(2q+1)...(2q+ 2p—3) 
<a 2 2.4 2.4... (2p — 2) 2p 
(2q— 1) (29+ 1) (2q = 1) (2g + 1)(2¢+3) “*(2q—1)(2q#1)...(2q+ 2p— 1) 
(129)? 2.4 2.4.6 2.4... 2p(2p + 2) 











woe 2.4...2p 2.4... 2p (2p + 2) 2.4... (4p — 4) (4p — 2) 
(2q—1)(2q+1)...(2g+2p—1) (2g—1)(2g+1)...(2g+ 2p4+1) °° (2qg—1)(2q+1)...(2g+ 4p— 3) 


! p+}, -1)p 
since (—1) ? 2 








=(-1)” 
Here the first element of the last » — 1 columns is seen to occur as factor for 
the whole column so that we can put outside the factor 
27-1 47-2 |, (2p — 4)? (2p — 2) 
(2q — 1)9-* (2g + 1)?-* (2g + 3)?-? (2g + 5)?-3... (2g + 2p — 5)? (2g + 2p — 3) 





19-1, 27-2... (p—2)2 
© (2q = 1) (2g + 1)?-* (2q + 3)?-* (2g + 5) 
the resulting expression being 




















p(p-1) 
: = (-3)" oP *. 9° *...to- SP e-s 2 ie 
pen (2g — 1)?-? (2q + 1)?-? (2q + 8)-®... (2q + Bp — 5)? (2q + 2p — 3) 
| ee 
1 Oq—1 +a 1 ia 1 
l-—a@ 2 4 ___ 2p 
(¢—1)@q¢+) 2q+3 i 2q+2p—1 
(1— 22)? 2.4 a : 2p (2p + 2) 
| (2g — 1) (2g + 1) (2q + 3) (29+ 3)(2g+5) °° (2¢+ 2p—1) (2g+ 2p+ 1) 
‘(-at 2.4... 2p __ 4.6... (2p + 2) Op... (4p— 2) 


(2g—1)(2g+1).. 2g+2p—1) (29+ 8)...29 4 2p+]) 








(2g — 1) (2g+1)(2¢+3) (2g—1)(2q+1)(2q+3)(2¢+ 5) °° (2qg—1)(2q4+1)...(2q + 2p+ 1) | 


"** (2q + 2p —1)...(2g + 4p — 3) | 








KigstinE Smita 41 


In our formulae the two cases g = 1 or g = 2 only occur for which according to 
this we find 












































Pp(p-1) 
} 3 _ (— 0). 1°+. 29-4... (p— 3° (p — 1)2 * 
| ott 3-1 59-2 79-3. (2p — 3)? (2p — 1) 
. 1 l+a 1 1 m7 1 
1-2? &. 3 4 G 2p 
) 1.3 5 7 i 2p +1 
(dat)? 2.4 4.6 6.8 2p(2p+2) 
1.3.5 5.7 7.9 “"  (2p+1)(p+8) 
) a2 2.4.6 4.6.8 6.8.10 2p (2p + 2) (2p + 4) 
1.3.5.7 5.7.9 9.11 “*(2p+1)(2p+ 3)(2p+ 5) 
Ge oe ES te) See! ee ee 
3) 1.3...(2p+1) 5.7...(2p+ 3) 7.9...(2p4+ 5) ° (2p+1)(2p+ 3)...(4p—1) 
au ie: (51) 
and 
1) p(p-1) 
we S (—1)*. 19" .2?...(9— 2)? (p—1)2 2 
1) ety Bet 59-2 79-2... (2p — 1)2 (2p 4+ 1) 
| =~ sta 1 1 1 
5) es. ee 4 6 2p 
5 7 9 = 2p+3 
(1— 22)? 2.4 4°.6 6.8 ; 2p (2p + 2) 
5.7 7.9 9.11 "(2p + 3) Qp + 5) 
(1—22)3 2.4.6 4.6.8 6.8.10 _ 2p (2p + 2) (2p + 4) 
| 3.5.7.9 7.9.11 9.11.13 "(2p +3)(2p + 5)(2p +7) | 
ron _2.4.6...2p 4.6...(2p+2) 6.8...(2p+4) 2p (2p + 2)... (4p— 2) 
| 3.5.7...(2p+3) 7.9...(2p+5) 9.11...(2p+7)” (2p+3)(2p+5)...(4p+4) | 


VI. Uniform continuous distribution of observations with additional clusters at the 
ends of the range; constant standard deviation of observations. Special formulae. 
(1) Our first task shall be to work out the formulae for ,o2 — ,,,02 for values 
of x up to 6, the next to find what values should be given to a in order to make 
no, as flat a curve as possible within the range of observations. 


1) With the notations just introduced (40) and (41) take the form 


1 
s 3 a2 
Ssy = 299 — 29-182 = _ (1+a) 28° 


















2 
- Sop 41 = 29419 — 29% = y (1+ a) 2? Ze ; 
pd - 9418 


From these formulae we find, after applying (45), (51) and (52), 
_ o% 3(1+a)2? 
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WE Ta Bg eee treeeeesieeonstonsvocunvenensserssccnnseenausqensecerssebanscenasess (53), 
2 1 Ve ae ee 3 
Cc e e 
%=9(l+ isis 8075.8) | 1-20 : | 
o* 5 [2+ 3 (1+ a) (x*— 1) (54) 
= 5a C5 “Eee : 
| 1 2 
a? s.c.7 | 1 gre 
S, = = (1+ a) 2% —— rt | 
N 1+1.3a'2?(1+2.5a) | Ae 
ies = 
3.5 
_o® 7 (1+a)a?[2+5(1+ 3a) (x*— 1) (55) 
= Woe 3a) (lt 10s) 
1 l+a 1 ‘ 
o? 1.32.5 1.3%.5%.72.9 2 1-2 : . | 
Si= +9) ayo 3a) BLES Be (8 a3 rad 
(l-a* 3] 5-7] 
o? 9 (1+ a)[8+ 20 (2 + 9a) (x? — 1) + 35 (1 + 6a) (2? — 1)*P ss 
aa ea ere 4 at. 2 ae (56), 
N26 (1 + 6a) (1 + 15a) 
| a, 1 |? 
| 1 Ts | 
— e 3.52.7 3.5%. 73.97.1172 2, ‘ 2 4 | 
Ss= 59 (1+ @) oer 2. Ba) * 22, 28(] sEr a si ttn Ss 4 
2.4 4.6 
— 2\2 ceaber (t 
wit SS oe ey 
_o 11 (1+ a) a?[8 + 28 (2 + 15a) (a? — 1) + 63 (1 + 10a) (a? — 1)?P (57) 
N° 28 eee. ee ere d 
1 l+a 1] 1 2 
6 
g (14g) 1-3. 7.9 1.38. 59.74, 99,112.13 (2.29)2 nee wi iil 
we NST") 9898(1 +3. Ba)‘ (2?.3)?.22(1+4.7 (s) ae ass ee 
° Slit Bait Go ate oc ke oe ORE ae 
| 2.4.6 4.6.8 6.8.10 
| ‘ene ahead Abies ; 
| (1-2! §.5.75.7.97.9.11 
_o* 13 (1+a)[16+168 (1+ 10a)(a?—1) +126 (3 +40a)(a®—1)*+231(1-+15a)(22—1)*? 
N° 28 ; (1 + 15a) (1 + 28a) 
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(2) We shall now look at ,,«} for special values of n and as a first attempt at 
0 gt=1 
finding a flat curve for ,o; try to make so =,0}. 


For a linear function we find, since 
19) = 0%) + S,, 


dis : (1 3 Cis #) REE aS (59). 


z=0 z?=1 
As a is positive it is obvious that we cannot make ,o? = ,o? which indeed we 
knew beforehand. This follows because we have proved that ,o° is of 2nth degree 
and never lower. 


For z = 0 we find z= 
1 


which holds for any symmetrical distribution of observations with constant 
standard deviation. a is the ratio between the number of observations at the ends 
of the range and the number uniformly distributed through the range, it may 
e 1 . 
therefore vary from 0 too. As to decreases when a increases we get the 
flattest possible curve when a = o, that is when the distribution of observations 
consists of two groups at the ends of the range. Then the curve is, as already shown 
in Section IT, 
7 
To get a check on the degree of the function and at the same time a flatter curve 
of o? than that obtained from a uniform distribution we may choose something 
between the two extreme cases and take for example }N observations at each end 
of the range and 3N uniformly distributed through the range. 


Then a = 1 and, according to (59), 


2 
o} = = (1+ §$2%), 
N 
a=) o 
i i = —,. 1-581. 
with the maximum Cy VN 1-58 


(3) For a function of the second degree we find, from (46), 
vee fae o* 9 (1 +a) (1 + 5a) 


os ae ae Gams ° 





#=1 ge ( 1 2 
5 2 — % a We 
and from (50), 2% = VT 3(1 +a) \1 + 3a 7F + zt : 


We want to make these equal and this requires 
3 (1 + 5a) (1 + 3a) = 4 {1 + 6a + 2 (1 + 3a)} 
or 15a? — 8a — 3 = 0. 


This has only one positive root a = -7873500. 
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For this value which is the ratio between the number of observations 


aie a)’ 


at one end of the range and the total number of observations, is -2202562. 


As ,07 = 0? + S, we find, from iy and (54), 
ie 3(1+a) , , 5[2+3(1+a) (e—1)}? 
2) = H(1+ 1+ 3a ay | 1+ 6a ) 
for a = -7873500 the curve is 
2 
= - {3-46837 — 6-2786222 + 6-278622'}, 








: ee 1 
which has minima at e=+ V2" 
The extreme values in the aaa of observations are therefore 


1-8624 for z= ii 
+1 


oy= 


vy 

and o,= oN 1-3779 for «= + -70711. 
(4) For a function of the third degree we have, from (46), 

#=9 o? 9(1+.a)(1+ 5a) 


NT 1+6’ 


z#=1 





2 5 
and from (50), y= 2(1 +a) | : +. ae .| srigudesavixsenests 


1+6a 1+ 10a) 
Hence the condition that they are equal is 
9 (1 + 5a) (1 + 10a) = 32 (2 + 15a) 
or 90a? — 69a — 11 = 0, 
with one positive root a = -9021461. 
From (60) and (55) we find 











a= F(1+ ee 5 [2 + 3 (1 + a) (2? — 1)}? 
ae eas Bs 4 1 + 6a 
7 (1+ a) 2 [2+5 (1+ 3a) (2?— ah 
4 ES“ ESS ae pene 


which for a = -9021461 becomes 
2 
= w {367775 + 17-78799x? — 48-5665124 + 30-7785225}. 


Besides the minimum for z = 0 this curve has other minima for z? = 


and maxima for 2? = -2361366. 


The maxima and minima are as follows: 


‘ {+1 9 
ve ae 0 C,= yw 1-9177, 
= *485 
» = +4+-48594 9,= yw 2-3612, 
» 2=+-90323 o,= 1-6055. 


UN: 


eeeeeee 


..(61). 


815820 
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By choosing a = -9021461, that is by taking -237139 x N observations at each 
end of the range, we seem therefore to have overshot our aim since the result is that 
we have got inside the range a maximum for a, greater than the value obtained for 
z=+1. 


(5) Our next attempt shall be to make 


w= 1 z=0 
3%, = 2 305° 
It requires 9 (1 + 5a) (1 + 10a) = 16 (2 + 15a) 
or 450a? — 105a — 23 = 6. 


The only positive root is a = -3710723 which gives the curve 
33 = © (2-730117 + 12-89741z* — 37-0761224 + 26-9088225}. 


The maxima and minima are: 


For «= -0000 y= vw 1-652, 
» c©=+ -4828 o,= vw 2-016, 
» &=+ 8279 oy= ve 1-678, 
» «= +1:0000 0, = ON 2-337. 


This distribution of observations makes o, for c= +1 greater than the 
maximum at z= + -4828. By interpolation between these two cases we shall 
now try to find an a, lying between those of our two trials, for which o, for 
x= +1 equals the maximum value of o, which still may be expected at about 
x = °48. 


x=1 
(6) In our first attempt we found o, = 1-918 and its difference from the 


* TN 
maximum UN -444, in the second attempt id ie . 2:337 and its difference from 
GC 
the maximum oR 321. 
If the relation were linear this difference would be zero for 
1 
‘o> cy OK, 


/ 7H 
The a for which o, Oy — this value is found by (61) which leads to 
8 (1 + a) (2 + 15a) = 2-1612 (1 + 6a) (1 + 10a) 
or 160-20? — 61-28a — 11-330 = 0, 
with the positive root a = -519. 


For this value (62) becomes 


30, = {2-086 + 14-23642? — 40-0058z4 + 27-45212%}. 


VOL. 12 — D 
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The maxima and minima are: 





For z= -0000 o,= on: 1728, 
» Z=+ 4843 Oy= Fy F116, | 
» &=+ °8585 Oy = Tay + 1658, | 
» = + 1-0000 oy= Gy: 2161, 


and this distribution which has -1708 x N observations at each end of the range 
may be considered satisfactory. 


(7) From (46) and (50) we find, for a function of the fourth degree, 
z= g®% 225 (1+ a) (1+ 14a) 


4%" N° 64 1+ 15a 





2 A Te 
1+10a ' 1+ 15a)’ 





and p51 +a) | 
which are equal when 

9 (1 + 14a) (1 + 10a) = 64 (1 + 12a) 
or 1260a? — 552a — 55 = 0, 
that is when a = 5217564. 


The formula for ,o7, found from (62) and (56), is 








,_ of 3(1+a) , 5(2+3(1+a)(a*—1)P , 7(1+a)2*[(2+5(1+3a)(a*1)) 
dee Be eS ae 1+ 6a +4 (1 + 3a) (1+ 10a) 
9 (1+) [8 + 20 (2 + 9a) (a*— 1) + 35 (1 + 6a) (a* — 1) - 
RSS 1‘nai+.0UmUmtmtC~” ia *)- 


For a = -5217564 it becomes 
4%, = © {5-03367 — 19-727722? + 133-01711a24 — 235-96817x* + 122-67868z5}. 


The maxima and minima are as follows: 


7 ate ( 0 — ill 9.¢ A 

steam yy FAT 
» &= +:3130 oy = yy: 2041, 
» &= +-6844 Oy = Fy = 2°51, 
» = +9361 a, = oN . 1-856. 


z*=1 
We have again as for the function of the third degree brought o, down below 


one of the maxima oi ,o,, although since ,o, has a maximum at « = 0 the demand 
z=0 2z7=1 
that o, =o, is not so exacting as for ,o, which has a minimum at z= 0. 
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(8) We shall next make ,o1 = 1-2671861 ,o2*. 
The condition obtained from (46) and (50) is 
9 x 1-2671861 (1 + 10a) (1 + 14a) = 64(1 + 12a) 
or a? — -3095773a — -032940969 = 0, 
with the only positive root a = +3933269. 
Introducing this value of a in (63) we get 


49, = © (461918 — 18-023882? +- 122-71833z4 — 220-34099z8 + 116-8807z%}. 


The maxima and minima for this curve are: 


Atz= 0 oy = Fy + 2149, 
» =+ °3116 oy = Fy: 1-958, 
» t=+ +6839 oy = ry: 2467, 
» = + “9214 oy = Fy L918, 
» 2=+1-0000 a, a 2-419. 


We have thus for a = -3933269, that is by taking -141147 x N observations at 


z*=1 
each end of the range, succeeded in bringing ,o, down to be approximately equal 
to the highest of the maxima of the curve, thus fulfilling our purpose. 


(9) After our experiences in the cases of the functions of the third and fourth 
degree we cannot expect for a function of the fifth degree by making 
w=1 xz=0 
5 oy = 5S 
to find a curve which has not a greater maximum than that value. We shall 


therefore start with the attempt 


z=1 
2 


2=0 
50, = 2 504, 
The condition found from (46) and (50) is 
25 (1 + 14a) (1 + 21a) = 64 (2 + 35a) 
or 7350a? — 1365a — 103 = 0, 
with the only positive root a = -2433100. 


* The ratio 1-2671861 results from consideration of a special oe curve. It was determined as 
that curve obtained from three groups of observations for which the standard deviation of oy’8 within 
the range of observations was a minimum. It is not mentioned elsewhere in this memoir as it does not 
seem to have the interest I at first assumed it to have. 
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For ,o%, we find, from (63) and (57), 
Wi {1 _ 8C+a) 9, 5(2+3(1+a)(e— IP, 7(L+a)a*[2 + 5(14 3a)(a*— 1) 


s%= WV 4 





a4 1+6a 4 (1+ 3a) (1+ 10a) 
9 (1+ a) [8 + 20(2 + 9a) (a — 1) + 35 (1+ 6a) (a* ~ 1)? 
64 es (1 + 6a) (1 + 15a) 
11 (1 + a) a?[8 + 28 (2 + 15a) (2% — 1) + 63 (1 + 10a) (2*— 1)? 
+ 64 ~~ (+ 10a) (1 + 21a) }..(64). 














Introducing a = -2433100 we get 


302 = «(414228 4. 28-4703022 — 258-05238a4 + 853-0448x° — 1095-92128 


+ 476-5990z)}, 
from which we find the maxima and minima: 


Atz= 0 Oy = yy + 21088, 
» G=+ +2953 o,= oy 2273, 
» @=+ +5004 Oy = Fy + 27155, 

~ Cc 
» t=+t -7853 Oy a alia 
e=+ 9418 o,=-%. .2-231 
” + y /N ,’ 
» ©= +1-0000 Oy = Fy: 2878. 


z’=1 
o, does not differ much from the greatest maximum and we may thus consider 


the distribution with -097848 x N observations at each end of the range for which 
a = ‘2433100 as satisfying fairly well our aim. 


(10) Considering our previous results we must assume that for a function of 
z*=1 /z=0 


the sixth degree o, | o;, ought to be made somewhat smaller than 2 which was 
the value that gave a satisfying result for a function of the fifth degree. 
z=1 z=0 
Let us assume o} = 1-75 0% or, substituting from (46) and (50), 
256 (1 + 24a) = 1-75 x 25 (1+ 21a) (1 + 27a) 
from which 567a? — 92-43430a — 4-851429 = 0 
and a = -2048019 


are found. 
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For ,o%, we get, from (64) and (58), 
a 3(l+a 5 [2+ 3 (1+ a) (a? —1)}? 
= Flt Tae eT aS = 

4 7 (L +a) a*[2 +5 (1+ 8a) (*— DP 
4 (1 + 3a) (1 + 10a) 

9 l+a 
64 (1 + 6a) (1 + 15a 
‘=. Fi the [8 + 28 (2 + 15a) (2* — 1) + 63 (1 + 10a) (2* — 1)*] 
64 (1 + 10a) (1 + 21a) 

13 l+a 


Aa ses 
356 (LF Ta) (4 Ba) (18 + 1681 + 10a) (2* — 1) 


+ 126 (3 + 40a) (2? — 1)? + 231 (1 + 15a) (a? — Pf 
which for a = -2048019 becomes 
2 
w= y {5-58984 — 33-1423422 + 504-452324 — 2512-67328 + 5524-18628 + 
— 5452-650z1° + 1974-020z?%}. 








cL 





y (8 + 20 (2 + 9a) (a* — 1) + 35 (1 + 6a) (a* — 1) 








The maxima and minima are: 


Atz= 0 oy = py: 2364, 
t= + 2216 a, = yy 2216, 
» = + +4826 a, = Gy 2515, 
»t=+ 6194 o, = ON 2-427, 
» t=+ 8445 0, = ON . 3149, 
»t=+ 9615 o, = ON . 2-485, 
» ©=+1-0000 a, "oe . 3-128. 


It thus appears that this distribution which has -08499 x N observations at 


a=1 
each end of the range fulfils our demand that o, shall be approximately equal 
to the greatest of the maxima. 


(11) We bring together our final results in the following table. It gives the 


distribution of observations, the maximum of oa, within the range, the value of 
Oy VN 
Go 


Vn +1 or the lowest maximum of possible, which can only be obtained 


by distributing the observations of the function of the nth degree into (n + 1) 


Pe tie? : oy VN ; 
groups, and the value of » +1 which is the maximum of —* - for a uniform 


distribution. 
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TABLE II. 
Ratio of number of | ‘ | 
Degree of observations at each | Maximum of - | 4 
function end of the range to | oy JN ntl a ee | 
the total number | o | | 
woes Bx | a ies 
1 | 2500 sel | 14 | 2 | 
2 -2203 | 1-862 1-732 3 
3 | +1708 2-161 2-000 | 4 
a -1411 2-467 2-236 5 | 
5 -0978 2-878 2-449 | 6 
6 -0850 3-149 2-646 | 7 





A comparison between our maximum and Vn + 1 shows the price we have to 
pay for information about the degree of the function. For lower degrees the 
maximum only differs quite insignificantly from Vn +1, but with increasing 
degree the difference grows relatively greater for the sixth degree, being about 
one-fifth of Vn + 1. 


The curves of standard deviation for the three sets of distributions are given 
in Diagrams 3—8, while Diagram 9 represents the six curves just reached. 
It seems likely from the form of the o, curves that two clusters of observations 
placed at the outermost of the maxima besides the two clusters at the ends of the 
range would produce a a, curve with a lower maximum than the one we have 
succeeded in getting for the functions from the fourth to the sixth degree. But 
then again the position of these new clusters would depend on the degree of the 
function and thus make the proceedings more complicated; and what is more at 
the same time as the maximum of the curve approached Vn + 1 the distribution 
of observations would incur the disadvantages of the grouping in (n + 1) clusters. 
On the whole the distribution arrived at seems to be satisfactory and certainly 
marks a great progress from the uniform distribution. 


VII. Observations with varying standard deviation. 


(1) In Section I we have already given the formula for the standard deviation o, 


of an adjusted y when the standard deviation s, of an observation is o V f (z). 


It is 


N 2 
o,.-5 | x We bate de x 
oe 
l My my i DEE My 
2 mM, My NG. cesee- Mati |=0. 
.2 
x Mo Ms eee Mnte 





Mn+1 Marte coecee Men 
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where m, = ¥ | zo ¥ (2) dx, (x) dx being the number of observations between 


f (2) 


x and x + dz and the integration being extended over the range of observations. 





It is clear that if we have found a suitable curve of squared standard deviation 
for adjusted y by taking a distribution ¢ (x) of observations with constant standard 
deviations a corresponding curve can be derived for observations with varying 
standard deviations by using the distribution 


oh (Z) = bb (2) F(Z) .22...-ec0e cevoee Lcevnesed (65). 
As fkd (x) .f (x) dx = N the constant k must be 
ape. BAe 
Sp (x) .f (x) da” 
_far.d(a)de Nps 
Sb (x). f(a) dx f(a). f(a) da’ 
where p, is the pth moment coefficient for the distribution ¢ (x), and as 
My N 





Hence we find My 


ta. aes | 
My Sd(x).f (x) da 
for any p the determinant may be written 
| 
| a3 Xe ae see Lape a 
1 1 py ia i aseses Pn 
x a Ba Wvertes Mats PO ccscvaccveseccates (66). 
a Co Ce 7 Pnta 
cm hn Pent Pinte seers Pen 





We thus find the same determinant as the distribution ¢ (x) would give for 
observations with constant error of observation except that the factor k has come 
in, that is to say the expression for o? has been multiplied by 


i=w/$@) NN a i a (67). 

1 

k ’ 
and because we have found ¢ (z) the best distribution for observations with constant 
standard deviation it does not follow that 


ib (x) = kd (x) .f (2) 
is the best distribution for observations with the standard deviation oVf (z). 


But the deriving of (x) from ¢ (z) is nevertheless useful as a means of simplifying 
the investigations ard will be applied in the following special inquiries. 


The goodness of the distribution therefore will partly depend on the value of 


We shall consider two forms of f(x) and try to find the best distributions for 
functions of the first and of the second degree. 








KirstinE Smit 59 


(a) f(x) = (1+ a2*)?, where a>—1, 
for errors of observation increasing or decreasing in both directions from the middle 
of the range. 


(b) f(x) =(1+a2)?, where 1>a20, 
for error of observations increasing in one direction. 


These two forms will roughly cover two distinct and important types of cases, 
such as occur in practice. 
(2) When f(x) = (1 + az)? we find, according to (67), 


1 
[= 1+ 2apg + ops, 
and as (66) for n = 1 gives 


1 
Oy - = & = —— {oe — Puy + 2}, 
a ba — {He Py } 
we have for a function of the first degree 

ites. «WIE 


co => — Qu,2x + ~ | LE e. 68). 
v~ W [lg — pe {He Py } (68) 
This curve has a minimum for z= p, and the maximum in the range is, if 
fy, > 0, at x= —1, and if p, <0, at e=1; it equals in both cases 


1. + La 


o* 2 
Hf (1+ 2ayg + ofp.) {1+ OF 


[44] being the numerical value of fy. 


Now (69) is a minimum for p, = 0; we therefore ought to choose that value 
for pz, and we then get, from (68), 


c= = (1 + 2apy + a? p14) {1 “b =I ceteseuaoseontheee eens (70), 
’—W bs 


ff, and py may vary between 0 and 1 independently of each other and are only 
bound by the conditions that 


and Bo = Pa > 


For any set of values which satisfies these conditions we may determine a 
distribution consisting of YN observations at = + v and (l—y) N at x=0, 


since from any two such values we could determine 


2 
and y= 21. 


By introducing v? and y for yz, and py we get two quite independent variables 
and (70) then takes the form 


of = s (1+ 2ayo® + a®yvt) (1 : 5) 
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z=1 
We now have to determine y and v* so that the maximum value o% is as small 
as possible. We find 


| aoe ae 
E 9 (2av* + a v za) Sr tee (71) 
eX) 1 
and Be N (2ay aa 2a? yu? + a2 =a) cbccveccesooeeoee (72). 
do; Y mI 1 
Clearly Fn 0 leads to y= av* (2 + av®) ccccccccccecoooese (73). 


Introducing this value into (72) we obtain 


a | — 1 

pseias! at ae 14+ ——_— 
FE ont On 7 ( + ase)” 
which is > 0. 


Hence the minimum for constant v? determined by 
| 
a) 
Es z=1 


But when v? decreases, y?, as given by (73), increases and the lowest value of v, 
for which it is real, is that determined by 
1 


Y= AO ia” 


decreases with v?. 


For v? smaller than this (73) gives y?> 1, and as long as y?=1 we therefore 


have 
7 | 
E o=1 


z=1 
Hence the minimum of o% is to be found for y? = 1. 


For this value (72) may be written as 


do® ou . 
lan Wy ga (h + av®) (Zev + a0? — 1) o.erseeecssseeeee (74), 
ja 
which is zero for ve= —}4+ NY ions | (75) 


and > 0 for v? greater than this value. 


When the v? found lies between 0 and 1, that is when a > }, we have thus found 


2 


doy, 


dv* 





the minimum sought. When a =}, then as given by (74) is <0 and the 
z=1 


z=1 
minimum of o% is found by giving v? its maximum value, that is 1. 


Returning to the variates », and py, we see that in all cases 


ae 
="F=-—=1, 
¥ fae | By 
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from which it follows that no distribution of observations other than those arrived 
at consisting of two equally big groups can give 1, “2 and p, the values required. 

We accordingly reach the result that: when observing a function of the-first degree 
for which the standard deviation of the observations is o (1+ ax*), symmetrical about 
the middle of the range, we get the best function for o%, by taking two equally big groups 


of observations, at the ends of the range if a = 4 and atu=+ J i 1+ :< 1 af 
a>}. 


(3) According to (70) the maximum of o% for this distribution is 


#-1 go 1 
o =F (+ at (14 5), 
v being equal to 1 fur a = 4 and v being determined by (75) for a > }. 
We shall next consider the distributions (i) for which ¢ (z) is constant from — 1 


to 1 and (ii) for which ¢ (z) consists of 5 observations uniformly distributed from 
— 1 tol and _ into two clusters. 


(i) For a uniform distribution from —1 to 1 we have p,.=}, w,y=} and, 
according to (67), i 


poit $+ $e, 


the actual distribution is hence, as ¢ (x) = = ; 
N (1+ az*)? 
YO=9 Tr gat bet’ 
and the maximum gj, as given by (70) for z = + 1, 
w=+1 o2 ” La?) .4 
oy = q (i+ gat $a%) 
. i oe N 
(ii) When ¢ (z) = : with the additional clusters r at + u we have 


re 2 ig te. Ac heal 
BMe=i +3 and py= ih t+ U4. 


According to (70) the maximum o;, is then 
z=+1 o2 5 : : 6 
a = [1 ta(} +e) +8 (iy + de) (1+ 29-5): 
We shall now determine u so as to make this a minimum. We find that 
z=1 
de of lf, " 2 ta 
Y on E A “eA — = 0 
due 75 [atl +a (I + u*)} (Sut + 12 (4 + fo) 
requires 
45a?u® + 15a (3 + 5a) ut + 5a (6 + 7a) u? — (90 —5a+ 9a?) =0 ...(76), 


the root u? of which is > 1 for a < -5576. 


xr=1 
For a = -5576 we hence get the minimum o% by taking the clusters at vu = + 1 
and for a > -5576 at the places + u determined by (76). 


VOL. 12 E 
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Table III contains for a series of values of a the values of v, (1+ av*) and u 
of the two distributions above and the maximum 9, for the three distributions. 





























TABLE III. 
| ! 
. JN 
Maximum of am pas of wenn : bs o 
oN en oy ‘= from from Sennen, 
2 | . as nent distri- distribution . vesoanee $(2)=% 
bution for ~~ and clusters of 
o(x)=5 vat +u 
| 
ace pits) |] —__—_}____ 
0 | 1.0000 | 1000 | 1414 | 2.000 10000 | 1-581 
4 | 10000 | 1167 | 1650 | 2113 | 10000 | 1-760 
4 | 10000 | 1-333 | 1886 | 2-231 | 1:0000 | 1-944 
} | 8836 | 1300 | 2100 | 2352 | 1-0000 | 2-131 
2 8071 1-434 2284 | 2477 | -9289 2-316 
2 ‘7510 | 1-470 2-448 2-603 | -8502 | 2-483 
1 -7071 1-500 2598 | 2733 | -7797 | 2-637 
2 5559 | 1618 | 3-330 3540 | -5762 | 3-438 
3 -4782 1-686 | 3-908 4-382 |} +4925 | 4173 
4 -4278 1:732 | 4-404 5-241 | 4612 4-899 








The difference between the maxima from the two first distributions taken as a 
proportion of the maximum of the first decreases from 41 per cent. at a = 0 to the 
minimum 5 per cent. at a = 1, and then again increases to 19 per cent. ata = 4. For 
small a, that is in practice a = 0, and again for a > 3, for which the difference is 
greater than 12 per cent., the third distribution may therefore be useful as giving 
a much smaller maximum value than the purely continuous distribution and at 
the same time offering some justification for the form of the function. 


(4) We shall next, still assuming that f(x) = (1 + ax?)?, consider the choice 
of observations for a function of the second degree. 
According to (66) and (67) we find 





ao 
w= EX 
(Mapa — Hi + 2 (Maps — Ma Ma) ® + (Mg — Sz + 2g ptg) O° + 2 (Ha Ma os) @* + (oe — mi) eA) 
( Poba — Hi + 2py Mobs — Kiba — BS 
Santo (77), 
and ; = 1+ ap, + au, 


where the p’s are the moment coefficients about z= 0 of the distribution ¢ (z) 
which is connected with the actual distribution y% (x) by the relation 
ib (x) = kp (x) . f (2). 
From any distribution ¢ (x) which has 1, and pp; = 0 we can form a symmetrical 
1 {dh (x) + ¢ (— x)} which has the same py and py, as g(x). We shall prove that 
the maximum o° obtained from the symmetrical distribution is always lower than 
that obtained from the skew. 
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Let the factor in curled brackets in (77) be F, for a skew distribution ¢ (z) 
and F, for the corresponding symmetrical distribution. 
We then have 
F, = bes + (Ha — 3p) a + Be* : 
Ba (Hs — #2) 
The condition for a maximum or minimum other than that at z = 0 is 





32 — a > 0, 

or B, < 3, 
and as the denominator is positive we have in that case the maximum at z = 0. 
It is thus clear that the maxima of F, between — 1 and 1 must be either at z = 0 
or atz=i1. 

We shall show that 

[FeJz=0 >[Folz=0, 

and that either [Fileuy OF [FJeu-2 > [Feleusi- 

According to what has been proved in Section I (4) the coefficient of x* in (77) is 
positive, the denominator of (77) is therefore positive and we have 


on 2 
iF. —F) =~ (0 eed |) > 0. 


(144 — 13) (Haba — #2 + 2a Maes — Mia — 13) 
We shall next compare F, and F, for z= +1. 


Putting [Felens =; 
; N—5 
we have [fF .|]-<1 = er 
where 8 = os — Quy os + pi + 2 {us (1 — pe) — oa (M2 — Had} 


and ‘ € = M3 — Quy Mops + Milly. 
For — we find 
€ 


8 (Ms — My)? + 2 {us (1 — pre) — pry (He — Pah 


€ (Hs — Habe)” + 11 (Ma — #2) 
Looking first at the case “? = 0, we have 
Hs 


(143 — 1)? < (M3 — pif), 
and if we choose the value for which the other term of the numerator is < 0, 
5 


Sat Fi 
€ 


When “ <0 we see, from considering the form 


5 y — Mi CL = Ba) = eas (E = pe) & 2 {ft (1 — oe) — Ha (He — Ha) 


2 


(Hs =e #1 fe)” Ls fi (Ma — #3) 
that for either'z = 1 orxg= — 1 


-<l. 
€ 

















64 Choice in the Distribution of Observations 


As « > 0 we have hence for any u, and yw, remembering that = being a squared 


D 
standard deviation multiplied by the number of observations is = 1, 
W=% 8% 
D-—e«° D° e’ 
that is, for either z= 1 or —1, 
F,> Fy). 


We have thus proved that the maxima of F, are below those of F,. 


(5) Our problem is hence reduced to finding the best ctrve among those repre- 


sented by 


, _ a7 (1+ 2ap, + a’ ee 
oi Gee ah is {uefa + — 3u2) a2 + wet .occceeee 78). 
uv" WN ite, = aba + (pM 42) Hex*} (78) 


As was stated in (2) of eo section we get all sets of possible values for p, 
and p4 from three groups of observations symmetrical about z = 0, and wé may 
therefore limit our search of the best distribution to these. 


Let the observations be tN at x=+0, at (l—y)N at z=0. The inter- 


polation formula of Lagrange gives, when 7, represents the mean of the observations 
at 7 = p, 








f—-F_ _s(e—9).. x(x +) _ 
7= =e Yo ~ Oy gage Y-» ~ Dy Yo> 
from which we find 
1. 1 (e— a (x? + v*) (1+ 
a-F-al i> + ea ieee (79). 
It is obvious that if for a certain distribution we have 

z=0 z=1 
oy > 9 


we can get a better distribution by taking more observations at 0. If on the other 
hand 


r= “0 v= rte 
S<¢. 
z= 1 . . 
the curve cannot be the best unless o? is @ minimum for the present values of v 


and y. From (79) we find 











kd =" 1 {t —-o7 (1 + v0) (1+ aoe (80) 
_dy om “ot (1 ey y)? y eeecccccccce 
d do*, xa oc by 2(1- —_ 0) _ (2 + v® — av) (1+ av?) 
es di? |. rat l-y . i: 
from which we obtain the conditions for maximum or minimum 
,_1+2Va 
v= 
a 
y 2Va(1+Va)? 
and i ee 
l-y aF2Va—1 
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The lower sign requires 3— 2/22 a=} and the upper sign a>3+2+/2' to 
make 0OZv*21. The case a < } has no interest, as we have seen that when a< } 
extrapolation is not even for a linear function advantageous. We have therefore 


_ x=] 
seen that for a< 3+ 2/2* o} has no minimum and we have thus proved that 
z=0 2z*=1 
the best distribution requires o? = 0}, that is 
Qv@—1 (1+ v%) (1+ av®)? 
1-y Y 
Sze (1 + v?) (1 + av’)? 
or i =9 =] + i: 5 al cdccescecesecossoocecees 


The maximum of the curve is 
N - i=> e 
To find the minimum of this value we differentiate (81) and get 


dc? _ 1+av — 
| (28 — 1)? {4avt — av? — 2a — 3}, 


1 B 48 
i = ae. 
v gli+ 33 + —) Sree Sette (82) 


and positive for greater v?, so that we have found a minimum. 





> 


which is zero for 


For a = 3 we find from (82) v?’= 1, hence for a = 3 we have to choose v? = 1, 
from which, according to (81), follows 
2(1+ a)? 
1+2( +a} 


I L a)? nT 
i. eee or y= 


When 3+2V2>a>3, 5 (1 + 4/33 + =) is < 1, and for the corresponding 





y we have 
. ia aca 
(It+avt)? 1+ 5a+4+Va(33a+ 48) (83) 
a coat) Sard) 


Returning to the ¢ (zx) distribution, which is foun:. from this distribution by 
dividing the frequencies by k . (1 + ax)*, we therefore find, when 5 NV is the number 


of observations at x = + v and (1 — e) N, that, at x = 0, 


€ 
2 1+v 
Te 2(2%—1) 
_v=1 
* A further examination shows that for a>3+2V2 oy has a minimum but this is smaller than 
x=0 z=] z=0 
oF when a<6-7. Up to this value we therefore have «1, = ¢}, for the best curves. For a>6-7 the 


a=1 
minimum of o7, determines the best distribution. 


Biometrika xm 
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a eset? 
a 
Hence be = $ (1+ v*) 
P Recadentasaukes Gi teavaerzerenes 84). 
and we= 3 (1+ 0%) ( ) 


For a = 3 we have found v? = 1 which according to (84) involves pz = py, 80 
that only the distribution above consisting of three groups can realise the requisite 
conditions. 


When a > 3 we have v <1 and therefore 4 < py, so that it must be possible 
to satisfy the equation (84) by a continuous distribution of observations. However 
v is decreasing so slowly for increasing a that practically the distribution deter- 
mined by (84) cannot differ much from three groups of observations. 


Our resu‘ts are accordingly that for a function of the second degree, of which the 
standard deviation of the observations is a (1+ ax*), we get the best function for o%, 
when a = 3 by taking three groups of observations at the middle and the ends of the 
range, each group proportional to the squared standard deviation at the place, and when 
34+2V2>a> 3 by taking three groups of observations determined by (82) and (83). 


(6) From (78) we find 


M4 
Pa — HS 
which, when pe, and p, are found in accordance with (82) and (84), determines the 
maximum go, arrived at from our special three groups of observations. Besides the 
numerical evaluation of this standard deviation, we give in Table IV below the 
maximum of o, obtained from a distribution for which ¢ (x) is constant from — 1 to 
1, that is, since, according to (67), 


] a a? 
k (1 be gk: 4 


(l+az?)?? N 


> 


the distribution (x) = 


, 2 ' 
- 3a 7 —— 


That maximum is determined by 
2 2 
9 o* ° a’ 
of = —(1+3a+%).9, 
v—WN ( 3 =) 


2 
Cc . . 9 . e P ° ° 
N° 9 being the maximum o; obtained from a rectangular distribution of observations 
with the standard deviation oc. 


The last column of the same table gives the maximum a, arrived at when ¢ (z) 
z=0 2z=1 
is the rectangular distribution with clusters at — 1 and 1 for which o? = o? . For 


this distribution consisting of -22026 N observations at +1 and at —1 and 
*5595 N = 2cN uniformly distributed from — 1 to 1, we have found as given in 
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Cc 


Table II (p. 50) the maximum UN * 1-862. Hence when pu, and p, are the moment 
coefficient of this ¢ (x) the maximum is found from 
o2 = (1 + Qapty + a%u4) . 1-862. 


We find jz = °6270, jug = -5524 and i = 1+ 1-2540a + -5524a°. 


The actual distribution is hence 
-27975 (1 + azx?)? 




















2) = .N 
¥ (x) 1 + 1-2540a +- -552402°" ’ 
together with the clusters 
z, *22026 (1 + a)? N 
1+ 1-2540a + -552402°" ” 
at — 1 and 1. 
TABLE IV. 
| | 
| ee of | Maximum of oe Maximum of A for 
a ve | a distribution | distribution with 
the best : mS ¢ (x) =c and clusters 
distribution with $(2)= 2 at +1 
0 - 1-782 3-000 1-862 
1 3-000 4-099 3-120 
2 4-359 5-310 4-453 
3 5-745 6-573 5-810 
4 | ° 7-135 7-861 7-178 
5 | 8-522 9-165 8-551 














The difference between the first and second maxima taken as a proportion of the 
first varies from 79 per cent. at a = 0 to 8 per cent. at a = 5, while the difference 
between the first and the third maxima varies from 8 per cent. at a = 0 to 0-4 
per cent. at a=5. The continuous distribution with clusters is ‘therefore 
especially useful for smaller a. 


For a = 4 we find from (82) v = -9816 and for a = 5, v = -9700, both of these 
values of v are so close to 1 that if instead of using them we take the observations 
at 1 and — 1 and let the numbers of the three groups of observations be proportional 
to the squared standard deviations we get the maxima 7-141 and 8-544 which only 
differ quite insignificantly from the corresponding values of Table IV. 


(7) For a function of the first degree, of which the standard deviation of the 
observations is o (1 + ax), where 0 =a <1, we have, according to (66) and (67), 


2 2 
on eT aw cand (85). 
ee: He— Fi 
For p, = — c? the maximum of this function is at z= 1, and for yu, =c? at — 1. 


As the maximum of (u, — 24,2 + «*) has the same value in both cases it is clear 
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that the negative pu, gives the lower maximum for o?. We therefore only have to 
find the conditions for [o;],-, being a minimum when p, < 0. 
We have 
ia o2 (1+ apy)” + a2 (U2 
ae a 
and differentiating with regard to p, 
kA a? a? (pw, — wi)? — (1 — wy)? (1 + apy)? 
sl "ei (Me — Hi)” 
As a <1, we have (1 — p,) (1 + ap,) > 0 and 
@ (fy -- aj) — (1 — pry) (1 + apy) = (ap, — 1) + (L— 2) <9, 
from which it follows that 
ka <0 
du, z=1 


The greatest value u, can take for our range — 1 to + 1 is 1, the minimum of 





MD) wie 8 ae (86), 





for any 4, = 0. 


r=] 
o;, must therefore be found for 4. = 1, for which value (86) passes into 


Alen = a fae e Oa, 


which, since 4, = 0, is a minimum and equals ¥ .2(1+ a?) when p,=0. 


N 
The ¢ (x) distribution ought accordingly to consist of two equally big groups 
at the ends of the range and the actual distribution to be chosen for a function of the 
first degree, the standard deviation of which is a linear function of the variable, should 


be two groups at the ends of the working range with numbers proportional to the squared 
standard deviations at these places. 


(8) For a continuous distribution from — 1 to 1 with frequencies proportional 
to the squared standard deviations we have 


Hy =0 and po == 4, 


. r=1 o a2 
and the maximum = (a ++ 3) 
N 3 
2 
the actual distribution is % (x) = cee = “3° 
1+ 


Table V contains besides the maxima of a, from these two distributions those 
obtained from a distribution for which ¢ (z) is constant with two additional clusters 


at — 1 and 1 each consisting of = of the observations. 


The actual distribution is, since 

Mo=3+t= 3, 
(1+ az)? N 
14+ $a? ° 4” 


i$ (x) = 
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ar 2 

with io cs i observations at — l 
2 

and Tee : 7 at + 1 in addition. 
a 


The maximum of o%, is 


O° (1+ 3a) §. 











N 
TABLE V. 
Maximum of : MI Maximum of ¢, yn for 
/N Maximum of ¢, *—| a Soy ee g 
oy — fo Bere Sees distribution with 
a : . for distribution | (2) 7 te 
est distri- ; N | (x)= 7 and clusters | 
: with y= — 4 
utes icaay | at +1 | 
; See | 
0 1414 | 2-000 | 1-581 | 
“1 1-421 2-003 1-587 | 
2 1-442 2-013 | 1-602 
3 1-477 2-030 | 1-628 | 
“4 1-523 2-053 1-663 
5 1-581 2-082 1-708 
6 1-649 2-117 1-761 
oY 1-726 2-157 1-821 
8 1-811 2-203 1-889 
‘9 1-903 2-254 1-962 

















(9) For a function of the second degree we found in (5) that when the standard 
deviation ‘of the observations was s, = o (1 + az?) and a = 3 it was advantageous 
to use the whole working range of observations, much more must this be the 
case when s, =o (1+ az) and0=a<1. We shall therefore try to find the three 
best groups of observations taken at — 1, v, and 1, supposing v unknown. We do 
not venture to assert that another form of distribution might not lead to a curve 
of standard deviation with lower maximum, but the solution of the general problem 
would involve a more elaborate investigation into the possible variations of p,, fe, 
fg and py for distributions with limited range than seems desirable in this con- 
nection. We shall further limit our problem by assuming that the best distribution 


will be found among those which vasice of =o! and both also equal to a maximum 
situated between x=—1 and «=1. This would obviously be right if the 
maximum were found at x = 1; this in fact is not the case, but still the maximum 
value is likely to be chiefly determined by the number of observations at x = v and 
there is therefore every reason to believe that our assumption is justifiable. 


Let there be N8 observations at —1, N.y at l and N(1—8—y) atv. The 
interpolation formula of Lagrange then gives 


_ (%@-—- v)(e—1)_ (v—-v)(e@+1)_ , #-1_ 
y= “Tit+92 82+ Guy. “teri 
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from which we find 
gape Uae, eee tit 
ae 4(1+v)? ° 8 4(l1—v? ° y 
(@@—1)? (1 ew 
(@— 12° 1—8—y}° 








a=1 
The condition for o =o? 


is | REPS, Re 





r=1 
Eliminating 5 we obtain for o? — o? the value 
gt o_o (Lt a) (@— 1) (| (1+ av) (@*@- 1) 
vy" W"N' (= 1F (0 Fa)®— By (1 + a) 





+p [0+ eat + 20(1 — 042-5 + ofl} 
or 
i Ae o (1 1 a )2 (a2 — 1) 





~ N’ (’—1)[(1 + a? — Iy(1 + ad) (1 + a)? (1 + v) — 2y (a — v)"] & 


+ Qu (1 — v) [(1 + a)? — 2y (1 + a2)] & + (1 + a)? (2 — Sv? + v4) 
— Qy [(1 + a2) (2 — 5e® + v8) + (1 + avy} occ eeeeeeeeereeee (87). 


2=1 
Our assumption that the maximum o? shall be equal to o%, requires that the 
expression in curled brackets shall be a perfect square for which the condition is 


2 
2 atl {a (1 + a?) v§ + 2a (1 + a?) v + (3 — a? — 3a4) vt — 4a (3 + 2a?) v® 
+ (— 2+ 9a? + 5a‘) v® + 2a (3 + a?) v — a? (3 + 2a%)} 





+a is 5 {— a2v® — 2av5 + (— 5 + 2a?) v4 + 12003 — (2 + 9a%) o? — av 
fo) eee ativgtcauacnas taday Pensivens kecbruvonsneoned eine (88). 
r=1 og? (l+a)?. ; , 
Now o} = v:. is the maximum which we want to make as low as 
: 
possible, hence we have for a certain a to find the v for which (i a as given by 
(88) is a maximum, 
We shall examine the cases a = -5 anda =°9. 
(10) For a=-5 (88) takes the form 
E - | {-625v + 2-505 + 5-125vt — 140? + 1-125e% + 6-5v — 1-75} 


ie corns —-Q5y8 — »5 — 4-5y4 -- Gy? — 4-D5y2 — 4i %?—1=0, 
+ayapt 2508 — v§ — 4-5v4 -- 6v8 — 4-250? — v + 434 vf + 2v?-1=0 
which differentiated with regard to v gives 

la & an) {3-75u® + 12-5v* + 20-5v8 — 4207 + 2-25u + 6-5} 


+a Gta 1-5v® — 5ut — 183 + 180? — 8-5v — 1} + 4v (x? + 1) =0 
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We find that these two equations have for v = — -190 the root 3 = °2936 


han ct 
: (1 + a) 
in common which represents a maximum. 


2 
e x C 
The maximum of the curve is hence — 


N° 3-405, which value occurs for x = + 1 
and for z = -064 determined by (87). 


The distribution of observations is 
-6607 N at 1, 
0734 N at — 1, 
and +2659 N at — -190. 
For comparison we shall consider what would result from taking for the ¢ (z) 


distribution three equally big groups of observations at — 1,0and1. This would 
for observations with the constant error o make the maximum of the curve equal 


2 
to _ .3 and that multiplied by 


3°5 
] -t Zap, + a” Us = a 
2 
gives _ 355. 


The actual distribution % (x) would be 
-6429 N at 1, 
0714 N at — 1, 
and 2857 N at 0. 
This last distribution only makes the maximum o? about 3 per cent. greater 


than the value which we obtained by our special distribution and it will therefore 
for most practical cases be as useful. 


(11) Wher a=-9 we find for (87), 
aap] 298220" + 6-516v5 + -443404 — 33-2640 + 17-1410% + 13-716v — 7-4844} 
ra 


+—" _ {_ .81y8 — 1-8r5 — 3-380 + 10-808 — 9-290? — 1-80 + 6-24} 
(1 + a)? 
+ v*+ 2r—-1=0, 
which differentiated with regard to v gives 


712 
fest {17-5932v5 + 32-58v4 + 1-77360* — 99-792v2 + 34-2820 + 13-716} 
Ta 
4. oe. {— 4-860 — Qvt — 13-52v3 + 32-402 — 18-58» — 18} + 40 (x2 + 1) =0. 
For v = — -354 these two equation: have the root a v a? 23214 in common 
Y 
(1 +a)?" 


which is therefore the maximum of 
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The maximum of the corresponding oj, is hence 
* (ite? ¢€¢ , 
>; °F 4-308. 
From (87) we find that it occurs at z = 125 as well as atz=+1. The dis- 
tribution of observations is then 


-8380 N at 1, 
-0023 N at —1, 
and -1597 N at — -354. 


Comparing again with a distribution consisting of three groups of observations 
at — 1, 0 and 1 with frequencies proportional to the squared standard deviations at 
these places we find that the distribution would be 

-7814 N at 1, 
-0022 N at — 1, 
and -2164 N at 0, 
and the maximum of o%, would be 
o o? 
w(t + Qapy + a®u,) = yn: *62. 

We thus find that by our special distribution the maximum of o% was 7 per cent. 
lower, the choice of that distribution would thus permit us to reduce the tatal 
number of observations at the same rate without raising the maximum of o?. 


(12) The result of these investigations is that the maximum a, obtained from 
the best three groups of observations differs so little from that obtained from three groups 
at —1, 0 and 1 that the first grouping only in quite exceptional practice would be pre- 
ferred. 


We shall therefore in Table VI give the maximum o, arrived at from the 


following three distributions: (1) three groups of observations at — 1, 0 and 1 in 
numbers proportional to the squared standard deviations at these places, (2) a 


distribution for which ¢ (x) = 4 and (3) a distribution for which ¢ (x) = :2797 N 


with additional clusters -2203 N at + 1 (see Table IT, p. 50). 


Both in Table V and in Table VI the difference between the two first maxima 
as a proportion of the first decreases with increasing a so that the distribution with 
uniform ¢ (zx) is more profitable for a > 0 than for observations with constant 
errors. 


VIII. Best distribution of observations for determining a single constant 
of the function. 


(1) Our choice of observations has hitherto aimed at giving within the working 
range of observations a determination of the function as accurate and uniform as 
possible. We shall now consider what is the best choice of observations for 
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TABLE VI. 





| 


Maximum of | yaximum of Cy x hal 


oy-— from | 
o 


three groups | 
at 0 and +1 





SoHIAGA DOA S 








1-732 
1-738 
1-755 
1-783 
1-822 
1-871 
1-929 
1-995 
2-069 
2-149 





from distribution 


for which ¢ (x) =* 


3-000 
3-005 
3-020 
3-045 
3-079 
3-122 
3-175 
3-236 
3-304 
3-381 


¢ 





| 
| 


1-862 
1-868 
1-886 
1-914 
1-954 
2-003 
2-062 
2-129 
2-205 
2-287 


Maximum oi ¢, 


| 
VN| Maximum of 
ers 


from distribution for, 
which $(x) =-2797N 
and clusters at +1 


Cy 5 al from 
o 


best three 
groups 








determining a single constant of the function. 
out for functions of the first and of the second degree for which the standard 
deviations of the observations are 


or 


‘8y =o (1+ az’), 
8 =a(1+az), 
We have in (3) of Section I given the formula (8) for o;,,, and shall here give only 


Ca 


the form to which it is transferred by putting 


(x) = kd (x) f (x), 


1 


i>aez 0. 


t= yo (fle ar, 


The formula analogous to that given for o% (66) is 


2 


GC 





tp’ o2 


af 0 0 0 
0 l Mi Pe 
0 fs Ps I 
0 He Bs a 
1 Pp Pen Bose 
O Yn Baw nse 


(2) For a function of the first degree 


Y = A) 4- a2, 


for which the standard deviation of an observation is 


and therefore 





8, =o (1+ az), 


1 
k 


de 


= 1+ Zap, -+ a*u,, 


le 


The investigations will be carried 








=0 ...(89). 
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we find, according to (89), 





ries es, (1+ i) (90) 

a = Vy Aplg + A", ome a3) deceesseestsessnceusd ‘ 

and = (1 + apg + a%u,) . i Sicgtnaey lo ndatutectaseenh (91). 
N Pe — By 


As for any skew distribution of observations we can find a corresponding 
symmetrical distribution with the same p, and py, both these expressions are a 
minimum for p, = 0. 


We have already shown in (2) of Section VII that any possible values of », and 
4 can be produced by three symmetrical groups of observations, so that by intro- 
ducing the variables v and y determined by 


pe = vy, 
and fy = vy, 
and limited by v= i, 
0O2y=l, 


‘we do not leave out any possibilities. 


From (90) we then get 
if 


o;, N 1 + 2ayv? + a? yv'), 


2 
P e ai o- 
which for a>0 is a minimum when y = v?= 0, and for a= 0 is =, for any y 


N 


and v?, 


For a < 0 we find, since 
do? of? 1 

“== — Jay (1+ av’) and v?< — - 
dv2 N y ( ) a > 
that for a constant y, o%, has the least value when v? is as great as possible, that 
is for v? = 1. 


The minimum of o”, is then 
9 Co 


o2, = 7 {1+ (2+ a)ay}, 


4 
‘ 
which, since a (2 + a) < 0, is a minimum when y takes its greatest possible value 1. 


The minimum is thus 


» Ie : 
Cn = (1+ a)?. 
Hence we conclude that: 
o2 
N 
o eats . : 

; for any distribution for which p, = 0, 


N 


when a > 0, o/, is a minimum and equal to =; for N observations at x = 0, 


when a = 0), o?, is a minimum and equal to 
and 


oC 
n | 


when a <0, o%, is a minimum and equal to 1+ a)? for two equally big groups 


of observations at + l. 
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(3) When we introduce p, = 0, p, = yo? and p, = yv* in (91) we get 
e- 2 a 
a= yl + 2ayv? + oP 


This for constant v* is a minimum when y = 1 and then equal to 





o 1 
08, = $5 (1+ Lat + ath) oe cossseseceseeessssseeeee (92) 
dot, ot, 1 
” da W (ew): 


1 : : ‘ ane : 
v? = t= when possible, that is for a =1 determines a minimum, while for 


a <1, o2, reaches its lowest value for v?=1. From (92) we find for a =1 the 


minimum 
a, SP 
Ga, N° a, 
and for a <1 the minimum 
i, =F (1+ ay 


2 
both formulae giving o7, = - .4fora=1. 


Our results are accordingly : 


2 
when a> 1, o%, is a minimum and equal to * . 4a for two equally big groups of 


N 


; 1 eee , 
observations at x= + — or for any distribution with the same p, and py, 
a 


2 
and when a = 1, o%, is a minimum and equal to 2 (1+ a)? for two equally big 


N 


groups of observations at x= + 1. 


We see that for a = 0 two equally big groups of observations at + 1 make both 
o2,and o2, minima and these groups in addition form the distribution for which o? 
has the lowest maximum within the possible range of observations. 

(4) For a function of the second degree 

Y =) + a2 + ag2*, 
with the standard deviations of observations 


sy =a(1+az7*), a>-1, 











and therefore : = 1+ Zap, + atu,, 
we find, from (89), 
i= y (1 + 2p, + oF) - ae io a (93), 
Tu, : (1 + 2apo + a4) ae He a nett (94), 
and oe, = Sa + 2Zapy + a?) ‘ae a ae a a (95). 
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We shall prove that the last factor of each of these formulae is a minimum for 
Ha = Ha = 0. 
To prove this for (93) we consider the difference 
Habe _ Hs wi Ha (Halts — Maps)” fs 
Hobs — 2 P3— 2eybebs + Mie Me (Ma — M2) [(Ms — Mabe)® + oi (Ha — #2)] 
from which follows 
Haba — Hs a _ BS 
3 2 2.7 37 73 a 
Pola — 2 — PEt Qpi bells — ibe Peba— 2 og — Quypepls + mip 
For (94) it is at once clear that 
Ba be SOE. sn 
Paply — 22 — [(Hs — Mable)? + ot (Ha — Pe) eps — oe 


For the case of (95) we compare 














ee ee ' and : Mi ya 
Pepa — Pe Bs — 2pyMefs + Mify 
and we find the difference 





~ 7 3 > 9, 
_ on pe—ni+ (18 — py) 
and hence 
Me — fi He Mm 


Matta — HS — H+ Yer boy — Mie Mable — 2 PS — rates + Ha” 

It is thus proved for the three formulae that a distribution of observations for 
which pu, = ps3 = 0 gives lower values than any distribution with the same p, and 
4 as the former and with p, < 0, ws 2 0. 

Hence our problem is reduced to finding the yz, and py which make the following 
expressions minima: 


2 
ou = 7 (1 + 2apy + a? 14) - Me at aaaenica basi (96), 
” 4 2 
gee l 
=H (1 + 2apy + a®y,) se | tee yaecnaeiatnyien ited (97), 
2 
a — _ + Zaps + aa) 3 Perreyrr ery Tee re (98): 
4 3 


(5) Introducing p, = yr? and p, = yv* in (96) we get 
ree 
Ca, = N 


except when y = 0. 


Y 
4 1+ ar), 
( 77 5 (1+ arty) 


2 
° ° Co 
which is seen to be > 


N 


2 
Hence the minimum value of o?. = - can only be obtained by taking all the 


observations at x = 0. 


9 


(97) is identical with (91) for »,=0. The conditions for a minimum of o?, are 





KirstinE SMITH 77 


therefore the same for a function of the second degree as for a function of the 


first degree. That is, when a> 1, 0%, is a minimum and equal to ~ 4a for two 


N° 


equally big groups of observations at x = + , or for any distribution with the same 


2 
4g and py, and when a = 1, o%, is a minimum and equal to y (1 + a)? for two equally 
big groups of observations at x = + 1. 
With the variates y and v (98) takes the form 


Oa, = Wy (1 + 2ayv® + a*yvt). as 


Fil 
N vy (1 
By differentiating with regard to v* we get 

dor, _ o 2 (— 
dv N'y(1—y) v8 


which is negative for any a, v and y within our limits. 





— ayv*), 


For constant y, 02, is therefore least when v? = 1 and the minimum value is 


_@e/l r 
7, =F ( +20 + a*) -— bce Sane (99) 
This is again a minimum when 
do? o 1 


e at Faye? +a)y?+ 2y—H}=0, 


that is for y = which gives a minimum both for positive and negative a. 





2 
Thus the distribution that makes o%, a minimum has a ¢ (z)-distribution 
consisting of ari 3) observations at — 1 and 1 and sre — N observations at 0. 
We have p= Me 557 
and : = (1+a) 
The relation ip (x) = kp (x) f (2) 
then gives us & (0) = “ 
and wh (+1) Sreant 


From (99) we find the minimum value 
ae a 
o%, = N (2 7 a) ° 


2 
Our result is thus that o%, 1s a minimum and equal to ¥ (2 + a)? for a distribution 


N 
consisting of ; ccc 7, (bservations at 0 and —— 


VOL. 12 —F 
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(6) When the standard deviation of an observation is 
s,=oa(1+az) and 0Sa<l, 











we have i= 1 + 2ap, + au, 
and according to (89) we find for a function of the first degree 
2 o* Me 
2 =a(1+2 B Lg) ————p_ seeececceccescccecees 
Fay n ( + 2apy + a? pp) me (100) 
and i tl + Ses + ay (101) 
a N My Be Le a fF Cee eeeceecescceccccce . 
By differentiating (100) we find 
don, _ O° 2p (1 + ap) (a + aM) 
du, (2 — wi? 
ai doi, __ 9? (apy — Zap — fy) (Ha + ap) 
: du, N (M2 — oi)? 
Both of these can only be zero when 
Pg a. PERI cea ae, (102), 


. . . 2 
which is seen to determine a minimum of o?, the value of which is vr The 
condition py, = — i * can be fulfilled by an infinity of different distributions. From 
OF mel 
follows the condition 05 wm = —a. 

We shall confine our attention to those distributions which consist of two groups 
of observations. Let there be Ny observations at v, and (1 — y) N at v,, we then 
have 

Hy = U, + y (V1 — %), 
ba = Uh + y (vf — v9), 
from which by means of (192) is found 


ee es Le Se aa. Sees 
—,(1+av) %(1+ ary) (%—%%){1+a(v,+ v,)} 
(1 + av,) (1 + av.) 
1+ a(v + 2%) 


Thus we find that the ¢ (x)-distribution consists of 


1 
and ,= 1+ 


— Vz (1 + avg) 
Ae uk 
(= 0%) {1+ a(y+r} % 
l : 
and aE sey N at x, 


(vy — U2) {1 + a (v, + 2,)} 


while the actual distribution 


— L+a(v,+ %) 
¥ (©) = Fan) (Fan) (b+ an)d (2) 
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consists of a N at | 


Vy — V2 
AS aiken bv eccteussivenereteeee 103). 
v, (1 + av,) r ve 


and ee v2 
Vy — V2 


We thus see that for any two points v, and v, of which one is negative and the other 


2 
4 . 9 o ° 
positive we can choose the numbers of observations so as to make o2,= = as it of 


N 
course would be by taking a single group of observations at x = 0. 
(7) By differentiating (101) we get 
de ao 2 
= a ox (1 + cf MEEa): wos <eencs Sescucuen 104 
pig, (ug — pape) + tea) Mea + ota) pat 
2 2 2 
‘a doi, __— a (1 + ap) 


djig N (1 — jn)? 
As the latter is always negative o?, is for constant j, least when p, has its 


greatest value, that is 1. 


on 


Introducing this in (104) we get as condition for a minimum, 
Py ta=0. 

There is only one distribution for which uw, = 1 and pw, = —a, and it is that 
consisting of two groups of observations at — 1 and 1 included in the distributions 
examined in (6). 

From (103) we find that the actual distribution consists of 2S N observations 


1 eax ; 2 

at — 1 and —— N atl. The minimum of o® is from (101) found to be 7: 
2 
Cc 


The minimum N of o%,, can thus only be obtained by taking two groups of observa- 


tions at the limits of the range with numbers proportional to the standard deviation 
of observations at these places. This distribution makes also a; @ minimum, but it 
is not, except when a = 0, the distribution which gives a, the lowest maximum value 
within the possible range of observations. 


(8) For a function of the second degree, 
Y =A + Qr+ are 
with the standard deviation 
o, = a(1+ a2), 


where 0sa<l 
we have _~ 1 + 2ap, + apo, 
and from (89), 
Wiese [lg fly — [3 
C2, = ay (1 + Qa, + a*py) ——__, —_, -__ , __——, _......(108), 
N ‘ Moby — He — M3 + 2py fabs — Mi fy me 
ae, My — Be 
a2, = = (1+ 2ap, + au.) ——_—_,—_+-_ ~+___._....(106), 
N ” * Pala — BS — Wi + 2p Mas — Mig wpe 
2_o He — Bi 
On, = 7 (1 + 2apy + apg) — 3-5: ---(107). 
ae ; * Pally — #2 — w+ 24 Mois — Milly — 
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(105) may be brought into the form 


2+ (eo, — #5) (a r =) a8 : (uz — atta)? | 
be Me r 





ot [ ad 
Ge = |} + 3 7% 2 
N Poba — Ba — Ms + 2p Mes — iby 
where the denominator and p24, — 3 are always positive. Hence the condition 
2 


for o7,, taking its minimum value 7 is 
fy + apg =O0 and ps -- yp, = 0 
1 
or BE SOEs: BB GES ceeeuaemeeda te (108). 
Po Ma a 


We shall examine the possible distributions consisting of three groups of 
observations with the frequencies y,, y, and y; at v,, v, and v3. The conditions 
(108) require 





24 2 2 ers 3 a at tas 2 Le 
Yi + V2 + Y¥3U3 uly v1" Tv Yat: T Yas _ Ys Vz (Ue — %) + 303 (Us am Poh ms 1 
Viti t Yet2t Yes VM t+ Yetz + Yas Yee (V2 — Y%) + Yep (3 — 24) a 
“ mats (1+ ars) _ yata(I+ ata) _ 700s (1+ any) (109) 
Up — Us Vs — 1 ae ees " 
v v V; , 
Now — ?- 2 _ and *. . can never all have the same sign and (1 + av) 


Vy— U3" U3— 0, V1 — Ue 

is for any v2 —1 positive, from which it follows that (109) leads to negative 
frequencies. Nor can (109) be satisfied by two groups of observations as y, = 0 
requires v, = v; = 0, that is one group of observations at x = 0 which of course 
Pe 


to N° 
(9) We may write (106) 
ae 5 ( LL (a= #2) a + opM2) 
"ON Nee be (Ma — #2) (He — Bi) — (Hs — Hae)? 
where the last ratio is seen to be positive unless 
fy + apg =O and peg — pyptg =O ...rcceccreccccececs (110). 


” 


gives o 


2 


+ (43 — bats!) 


If therefore any distribution of observations can give — its minimum value 1 
Me 


and at the same time fulfil those conditions it will make of, a minimum and equal 
2 
to 7 But py, = 1 together with (110) lead to 


fe ri Ss 


; F 1 
which require i+s N observations at — 1 and ——N at 1, whereas the actual 


9 y 
, 1- : 1 . 
distribution must consist of — N observations at — 1 and — N at 1. 
2 
Thus the only distribution which makes of, a minimum and equal to ¥ is that 
-- , 1 
consisting of : _ N observations at —1 and SN at 1. 
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(10) The general minimum conditions for o,, cannot be found without more 
elaborate investigations into the possible variations of the moment coefficients 
than are at present available and we shall limit our research to the case of three 
groups of observations. 


Let us suppose y, NV, y, N and (1 — y; — y2) N observations taken at 2,, 22 
and 23, and let the corresponding means be 7, 7, and 7s. 
We then find, when 
A = (% — 22) (%_ — 23) (%3 — 24), 


eed A s 
dy = KAY (43 — La) + Yo (t1 — Xp) + Ys (Ge — %4)}, 
and 
2 © {(%—a%)P(L+an)? | (%—%)*(1+am)? | (z,—2)?(1+ axg)?) 
oo, = A2 N Tis ae Ne aa a EP ee ER ey Oa as were i ea rs 
. v1 Y2 Yi — Y2 
Differentiations first with regard to y, and then with regard to y, give the 
minimum conditions 


vi ak Y: (l-—», —y 


(3 — @)* (1+ a2, («,—a,)*(1+ax,)* (2, — 2)? (1+ az)?’ 


or, when we suppose 2, < %_<23, 











. Ys oe oe 1 
(t3—@,)(1+ aa) (%—@3)(l+a%,) (%,—%,)(l+az3) 2(2,—2,)(1+ a2) 
Given (112) 
With these values for y, and y, we get from (111) 
Pe ae 
wh | A N \(%_— 2) (23 — %2)S * 
This for constant z, is obviously a minimum for x, = — 1 and z, = 1 and is then 


equal to 





2 _% {2 (1 + a2,)}? 
"a" Nl 1-@ ‘ 


From this we find 
dog, _ _9 2 (am; + 2x, + a) 
dx, «SN (1 — a3)" 








which shows that 


determines a minimum. 
The minimum value is 
C= (lt V1 — a)’, 
and the frequencies found from (112) are 
EVi-a(Vite—Vi—a).N at —1, 


Biometrika xu 
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ivi+a(Vita—Vi-—a).N at 1, 
and 4N at ~=(l a i —%, 





IX. Adjustment with regard to both of two variates connected by 
a lunear relation. 

(1) The case often occurs when both of the variates observed have errors of 
observations of the same order so that adjustment only of one of them is unsatis- 
factory. We shall therefore in this section consider adjustment with regard to 
both of the variates and give the adjusted relation between them and the standard | 
deviations of the constants. | 


Let x’ be observed with the standard deviation Vac and y’ with the standard 


. | 
deviation Wyo, we shall then for the sake of greater perspicuity exchange the | 
variates for z= —- and y= + 0 that both of our variates have the same 

Va vy | 


standard deviation o. Let be {x"y*} taken over the N pairs of observations be 


denoted by ,,,,, we then find, by adjusting only the y’s according to (3), 


; + 24 
Bas 1 410 ie 
| Hi Pio P20 | 
or Y — hog = ER POU (oe — fgg) saseenvenseseceoecees (113). 
Peo — Pin 
By adjusting only the z’s we get 
Boz — Bin 
y— me OE EE ( — fhyg) cececececsceceereeees 114), 
sis P11 — Fo1F10 ( - ( 


which only coincide with (113) when 

(M20 — io) (Mos — Pox) = (M11 — HorH10)"5 
that is when there is perfect correlation between x and y and no casual errors of 
observation. 


(2) Adjusting at the same time with regard to x and y may be transformed to 
the problem of finding the straight line for which the sum of the squared distances 
of the observed points (z, y) is a minimum. 


Let the line sought be 
xcosu+ysinv+p=0. 


The sum which we want to make a minimum is then 


S = poy COS? V + flys Sin? v + 2yu,, COs v sin V + Zp COS V + 2ppy; Sin v + p?, 
dS ° ° 
a” 0 requires P = — fy COS V — pg SiN Vv, 
indicating that the line passes through the mean (149, f49,); this determines a 
minimum for constant v. : 
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The corresponding S is 
S = (t4y9 — Mio) COS* V + (Hoe — Myx) Sin? v + 2 (441 — Ho 419) COS V Sin V...... (115), 
which differentiated with regard to v gives 
dS , 
\ {H20 — Mio — (Moz — #or)} Sin 2v + 2 (441 — Hor) CO8 20. 
It thus follows that 
tan 2y — --2 a1 Moo) = _ 2 tan 
#20 — Bio — (Ho2 — Hn) == 1 — tan? v 
or 
tan » = 3 {Hee — Bin — (Heo — Mio) + V [Hoe — Hor — (Heo — Mio) }® + 4 [tr — Hrofor}*} 
Pu — Poik*10 








determine a maximum and a minimum of S. 
Substituting in (115) we find 
S = 3 {igo — pio + Hoe — Mar + V [M20 — Mio — (Mon — oad)? + 4 [en — Mortaol’? 
so that the minimum corresponds to the negative sign of the root in (116). 





The adjusted function connecting x and y is hence a line through the general 
mean forming an angle u with the x-axis which is determined by 
a2 _ Pos — Ha — (20 — Hho) + V [p20 — Mio — (Hoe — ph) P+ 4 [en — Morro]? 
tan u = — cotv="—— 
2 (Ma — Hero) 





For the variates x’ and y’ there must to this value of the tangent be added the 
factor J .. expressed by the moment coefficients of x’ and y’ we therefore find 
a 


@ (Lie — Mis) — Y (Mao — werd) + V Ly (Meo — od) — 2 (Hee — pod) |? + 4ay [pn — Moree)? 


tan u = 7 BP | 
2a (Hu me oi Mio) 





(3) We shall prove that the line is situated between the two regression curves 
(113) and (114). 
Making (149, 49) the zero point of the coordinates, the three tangents to be com- 
pared are 
Pir = Mog 
Meo” Pas 
where the y’s now are the moment coefficients about the mean. 


1 e" 1 2 
and Qu {oz be 20 ete V (os — Hoo)” + 4ui,} => tan Uu, 
11 


According to p14, = 0 we have 


P11 < Moe 
Peo Pi 
since Hit < B20 + Hoes 
As V (02 — Hoe)? + 4uis < poz + Heo» 
we have tan u < 4. 


Pu 
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It rests to compare tan u and oth we find 
20 


ag ae /| 208 na 
tan u—— = oz — ——-+ = pe at) eee —pi)r. 
lies un P20 iH20 Moz — Heo Moo ra (Hoat20 — Bin 





P41 
20 


ye 


The factor in curled brackets is hence positive and we have tanu>or < 
according as fy, > or <0, 
we have thus proved that 


Pat < tan y < M0, 
M20 My 
(4) In order to find the standard deviations of the constants of the line we 
shall express the observations, the standard deviations of which are Vac and 
V yo, by a parameter r to get an equation for each observation. 
Suppose ;=a+7;,C008 U, 
y,=6+7,sin u, 
and suppose we have a good approximation for a, b, u, ry, 9 ...... ry from which is 
calculated x and y corresponding to the observations. The differences between 
observed and calculated x and y can then be expressed by 
6, = D8 — TORS RE EON OE (119) 
Ay, = Ab + r;cosu. Au + sin uw. Ar;) ; 
and we can carry out an adjustment, Aa, Ab, Au, Ar,, Ar,... Ary being the 
elements. 


The normal equations are: 


1 N sin u cos u cosu 

= {x;} =-- Aa+0.Ab-= {r;} -—— Aut Ar, +... + —— Ary, 
a a a a a 
1 N COs u sin u sin u 
— E fy;} =0.Aa+ — 46+ {r;} —— Au+— Ar, +... + —Ary, 
7 Y Y Y 








: sin u _ COS U ) 
x {r.{ - @ Ax; + ¥ avi |} 


. 3 $ \ 

sin u P COs U : ,, sin?u  cos?u : ee | . 

= -= jir;' Aa+ {rj} - Ab +2 frit?| -— + oot au tn (< - 5) cosusinuar, +... 
Y iL a z % 


a= = cos usin uAry, 
yY a 


COS U sin u 
— Aa + Ay; 
ei 


SU sin u Si 4 “ cos*u — sin? a 
jie ee Aa + cs Ab +n (- - ) cos wsin udu + ( —+ — )Ar,+...+0 Ary, 
a Y y @ Bae Y 
igudcvaevescdunsstetccuer vanes uvivces trckcoavestes esbu eked egageeg reds shovsdeeastsobececivesesbess 


Ay 


cos u sin u 


Ary + 


208 sinu 1 -l . , cos?u | sin? wu 
ie in} ot HP, (< ~ ) cos usinudut+0. Ar, +...+ ( — +—— ) ars 
a Y y @ a fy 


Eliminating r,, 72 ... ry from the first and the third of these equations by means 
of the last N equations, we obtain 
L {sin wAzx,; — cos uAy;} = N sin uAa — N cos uAb — 2 {r;} Au...... (120), 
and 


Y {r; [sin uAz,; — cos uAy;]} = = {r,} sin uAa — ¥ {r,} cos uAb — & {rj} Aw...(121). 
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By eliminating the r’s from the second of the normal equations we get an equation 
identical with (120), which shows that we have one more element than we can 
determine. 

From (120) and (121) we are however able to find 

(sin uAa — cosuAb) and Au; we get 


. 1 3 
sin wAa — cos wAb = ia. x {(m, — m,7;) (sin wAx, — cos uAy,)} 
1 : 
and Au = Ton = ni) X {(m, —7;) (sin uAz; — cos uAy,)}, 
where m, = 4 x ir} and my= 7 = (72). 


For a point of the adjusted line corresponding to 7, we find, according to (119), 
Pq = sin uAz, — cos uAy, = sin uAa — cos uAb — r, Au. 
The standard deviation of p, is seen to be the standard deviation of the position 
of the adjusted point (z,, y,) in the direction at right angles to the line. 


We find 


1 : 
Dy = N (nm, — m8) x {[m, — m7; — 7 (mM, — 7,)] (sin uAz, — cos uAy,)} 
"sae o in? 2 (Tp — ™)" 
and o, = (a sin? u + + cos? w) {1 + ue : 


This standard deviation is quite analogous to that obtained for an adjusted 
ordinate when the abscissa is errorless and gives the same indications for the dis- 
tribution of the observations. 

For o,, we find 

»__ o (asin? u + y cos? u) 
~N (m, — mi) : 
again emphasising that the standard deviation of the 7’s ought to be a maximum 
to give the best determination of the line. 


In conclusion I should like to express my thanks to Miss H. Gertrude Jones 
for the care she has devoted to the preparation of the diagrams in this paper. 











ON THE PRODUCT-MOMENTS OF VARIOUS ORDERS OF 
THE NORMAL CORRELATION SURFACE OF TWO 
VARIATES. 


By K. PEARSON anp A. W. YOUNG. 


(1) In several recent investigations we have found it desirable to have the 
values of product-moment coefficients about the mean of the normal correlation 
surface. The present paper deals with the case of two variates. If the correlation 
surface be a 1.1 /e® tay yp? 

E ae : 
0,0, V1 —r Coc vcccvcccccccees : 
where o, and a, are the standard deviations of the two variates x and y and r their 
correlation, then we define the sth-tth product-moment coefficient to be 
1 p+" p+ a 
qt = 7H | EIEN. cnicdsnvsdsavccnesteteesre’ (ii). 
-* 


/ 0 





z 


Further we write i ORM OMED Ss suite Cas ieweshaweaensaecowect (iii), 


so that p, , is a purely numerical quantity and a function of the variable r only. 
Clearly from the symmetry of the surface 


Pos, at+1 = Pos+t, 2t = 9. 
We are accordingly only concerned with cases in which s + ¢ is even. 

We propose therefore first to give the general algebraical expressions for the 
lower values of p,,, and secondly to provide tables for the numerical values of 
these product moments proceeding by increments of -05 in r. 

Since s -+ ¢ must be even if p,, be not zero, it follows that s and ¢ must either 
both be even or both be odd. In the former case p, , does not alter when r changes 
sign; in the latter case p, , for negative r is simply p, , for positive r with the sign 
changed. It is accordingly only needful to table p, , for positive values of r. 

For the purpose of testing computations the following formulae are of value: 


Pot = (8 +t — 1) rpy_a ta + (8 — 1) (@ — 1) (1 — 7°) Dea ta... (iv), 
Pst = (t — 1) Ps, t-2 + 87 Ps—a, 1-1 = (8 — 1) Psat + Pea, tor oes eee (v). 
Or, again we may write hg WE a wv cnssccdipabrvdedaariaasedentantes (vi), 
and we have od ee) Pe Meng eer neem rerener (vil), 


which is capable of numerical evaluation in a single machine operation. 
The general values for any normal product-moment coefficient are 





ey a r 
Pas, at = —oazt eo le-wle—u! ait Winds oaneanees (vill), 
(28+ 1)! t+ 1)!¥or( (2r)?“ 





Parsnates = Qe+t u-o(@—u)! ¢—u)! Qu+]) i ante oy 
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(2) We are now in a position to set down the algebraical values of the product- 
moment coefficients. 
(a) sort=0. Po, 21 = Peto = 1.3.5...... (2¢ — 1), 
Poo=1, Poo=l, Pao=3, Poo=15, Pao=105, Poo = 945. 
These are of course the simple moment coefficients of the normal curve when 
the unit of abscissal length is the standard deviation. 
(6) sor¢é=1. = Patan = Puatas = 1.3.5...... (2¢+ 1)r, 
Pu=T, Psi.= 37, Ps1= 157, pr,=105r, po, = 945r. 
Generally p,+_1,1/Por,0 = 7 and provides a means of finding r and testing how 
far the correlation of two variates is normal. 
(c) sort=2. 
Pot = Pa,g= 1.3.5...... (2¢ — 1) {1 + 2ér°}, 
Poe=1+ 27, po,g=3(14+ 42), fog = 15(1+ 6), 
Po,g = 105 (1+ 8r?), po 19 = 945 (1 + 107°). 
(d) sort=3. 
Poct+1 = Patas,s = 1.3.5...... (2t + 1) r{3 + 2ér%}, 
Ps,3 = 3r (3+ 2°), ps5= 15r(3+ 47"), ps7 = 105r (3 + Gr?), ps9 = 945r (3 + 87°). 
(e) sort=4. 
Post = Pae= 1.3.5...... (2¢— 1) {3 + 12tr* + 4¢(¢ — 1) 9, 
Pa,a= 3(3 + 2472+ Brt), 4g = 15 (3 + 36r? + 247%), 
Pag = 105 (3 + 487? + 48r4), M4 19 = 945 (3 + 60r? + 80r4). 
(f)- 8 ort = 5. 
Ps, 041 = Pacss,s = 1 .3:5...... (2¢ + 1) r {15 + 20tr? + 4¢ (¢ — 1) r%, 
Ps,5 = 15r (15 + 407? + 8r4), ps7 = 105r (15 + 607? + 24r4), 
Ps, 9 = 945r (15 + 80r? + 48r4). 
(g) sort = 6. 

Poot = Pate = 1.3.5...... (2¢ — 1) {15 + 90¢r? + 608 (¢ — 1) r* + 8¢ (t — 1) (t— 2) r} 
Po,¢ = 45 (15 + 2707? + 360r4 + 487°), pg, = 105 (15 + 3607? + 720r4 + 192r%), 
Po, 10 = 945 (15 + 450r2 + 1200r4 + 480r8). 

(hk) sort=7. 
Po, 2041 = Patas,7 = 1.3.5... (2¢+1)r 
{105 + 210¢tr? + 84¢ (¢ — 1) r* + 8¢ (t — 1) (t — 2) r}, 
Po,7 = 105r (105 + 630r? + 504r4 + 48r8), 
P2,9 = 945r (105 + 8407? + 100874 + 19276). 
(i) sort=8. 
Poot = Pa,g=1.3.5...... (2t — 1) {105 + 840r? + 8402 (¢t — 1) r4 
+ 224¢ (t — 1) (t — 2) r® + 16¢ (t — 1) (¢ — 2) (¢ — 3) r%}, 
Ps,g= 105 (105 + 33607? + 10080r* + 5376r* + 384r%), 
Ps, 10 = 945 (105 + 42007 + 1680074 + 13440r® + 192078). 
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(7) sort=9. 
Po 9041 = Potsi,9=1.3.5...... (2¢ + 1) r {945 + 2520¢r? + 1512¢ (¢ — 1) r4 
+ 288¢ (¢ — 1) (¢ — 2) r® + 16¢ (t — 1) (¢ — 2) (t — 3) 7}, 
Po, 9 = 945r (945 + 10080r? + 18144r4 + 691276 + 384r°). 
(k) sort= 10. 
Pr, 2t = Pat, = 1.3.5...... (2¢ — 1) {945 + 9450¢r? + 126002 (¢ — 1) 74 
+ 50400 (t — 1) (¢ — 2) r® + 7208¢ (¢ — 1) (¢ — 2) (t — 3) 78 
+ 32¢ (¢ — 1) (¢ — 2) (t — 3) (¢ — 4) 7}, 
Pro, 10 = 945 (945 + 472507? + 252000r4 + 302400r8 + 864000r8 + 3840r?). 
The table on pp. 90-1 gives the numerical values of these coefficients. We 
proceed to illustrate their use. 


Illustration I. In discussing the relation of auricular height (y) with age (2) 
of a girl’s head a sample of 2272 individuals was found to provide the following 
product-moment coefficients: 

91,1 = 3°113,712, 3,1 = 74:447,616, 

Yo,1 = — 1-957,022, Ya,1 = — 108-701,559. 
Are these incompatible with normal correlation? (See K. Pearson, On the General 
Theory of Skew Correlation and non-linear Regression, Drapers’ Company Research 
Memoirs, Biometric Series II, p. 35.) We have 

o, = 3-064,819, o, = 3-454,125, 
and r = 294,128, 
and the leading subscript above corresponds to the x coordinate. We need first 
the values of go1, 3,3 and g4 on the hypothesis of normality. Clearly q., and q4, 
will be zero, and using linear interpolation: 
93,1 = F2°OyP's,1 
— 99-437,979 x -88256 

87-759,983 = 87-7600, say. 

In the next place we require the probable errors of these q’s. The general 
expression for the probable error of a product-moment about the mean is given 
in Biomelrika, Vol. 1x, p. 3. In our present notation it is 

Re = Jos,2t — Ps,t + $°J2,09'sa,t +P Go, 29s, t-1 
+ 28694,19s—1, ts, t-1 — 289541, 195-1, t — 2lGs, 1419s, t-1- 


Now remembering that for a normal distribution g vanishes when s + ¢ is odd, 


I 


and that g4 5 = 30,4, while oe 


%z,¢= 10 2 P's, 1%q'0,', 
we have 
No?,,,, = 4,2 + 492,0971,1 + 0,292.0 + £9711 92,0 — 493,191,1 — 292, 292,0 
= o,40,7 {10p', 9 + 8727 + 1 — 4rp’s1 — 2p’s >}, 
o,20 


1 
o = TN {10p'4.¢ + Br? + 1 — Arp’s. 1 — Apna}? .....cececscecececsvecesecs (x). 


V's . 
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No*,,,, = %6,2 — 3,1 = F2° oy (100p’. » — ps1), 
20. ’ , . 
o.,,= oN Nc II ig 2 ice charts cass (xi). 
No*,,,, = 98,2 — $41 + 1692,09%s,1 + 0,290 
+ 891,193,19%4,0 — 895,193,1 — 244,294,0 
= 0,°0,7 {1000p’g, 2 + 16p5,, + 9 + 24rp’s,1 — 80p's,1p's,1 — 60p's 2}, 
o,'0y 


Oa, aa a/N 





{1000p’s o + 16p", +9 + 24rp’s , — 80p's,1P’s.1 — 60's, 3? 





We require accordingly to determine the following p’s: p's 2, P'31, P'4,2> P's,1> 
p's,2 and p’, » by aid of our table with second differences or direct calculation from 
the algebraic values in terms of r. We have 

P's,2= 1-173,0226, p's, = 782,3840, p’,,. = -403,8135, 
P's1= °441,1920, p’..=-227,8602, p’,.=-177,6695. 


Also -6744898/V'N = -014,1505. 
Substituting we find the following probable errors: 
P.E. ofr = -012,926, 


P.E. of g., = +720,631, 
P.E. of g3, 1 = 6°625,903, 
*P.E. of gy, = 51-267,688 
We can now sum up our results for these data: 
r = -2941 + -0129, 
J2,1 = 0 + -7206, 
93,1 = 87-7600 + 6-6259, 
94,1 = 0 + 51-2677. 

The probable errors would have been to some extent modified had we been able 

to calculate them on the true and not the observed r. We have 
Age, :/P.E. of go, = — 2-716, 
Ags, ;/P.E. of gs, 3 = — 2-009, 
Aq, ;/P.E. of g4,4 = — 2-120. 

Thus none of the deviations are excessive in terms of their probable errors. 
The system accordingly. does not diverge very widely from the normal. At the 
same time the deviations are all in one sense, i.e. in defect of the normal value, and 
are all greater than twice the probabie error. It appears therefore probable that 


there is some significant if slight deviation from normal correlation in the growth 
of the auricular height. 


Illustration II. For the correlation of the contemporaneous barometric heights 
at Laudale and Southampton the following values have been found: 
Southampton (z) o, = 3-250,067) r = -780,225, 
Laudale (y) oy = 3°932,290) N = 29292. 
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The f’s of the marginal distributions show a markedly skew and non-normal 
system. The regression is, however, closely linear. Discuss the values of the 
product-moment coefficients : 

Y2,1 = 11-919,404, 
41,2 = 15-598,613, 
Jo, 2 = 401-523,496. 
For a normal system with the above correlation coefficient we should have: 
92.1 = %,2=9 and q,,.—0,20,7p', . = 362°192,761. 
Thus Age; = 11-919,404, 
Agy, 2 = 15-598,613, 
Ags, 2 = 39°330,735. 

We require to consider the probable errors of the q’s, which are given by 

-6744898 times the following standard deviations: 








Oe Oy gs : Pee al 
oo, , = /N {10p',2+ 8r?+ 1 — 4rp’s ,— 2p’, o}°, 
0,0. 2 ’ ’ , 
%%,,.> VN {10p's,4+ 87? + 1 — 4rp’,,— 2p ar, 
o,20,7 ; a ki 
C..= VN {100p',.4—p 2} ipaddgecnaceNecec dose teespereeeeees (xill). 
We determine for the above value of r: 
P's,» = 2°217,5021, P's.1 = P's,.2 = 2°340,6750, 
P'2,4 = P'4,2 = 1:030,50246, P's,4 = °617,239437. 
Our results for a normal distribution are: 
P.E. ofr = 004,882, 


P.E. of g2, = 1-091,473, 

P.E. of ¢,,2 = 1-320,585, 

P.E. of go. = 15-360,681. 

Hence Ags, 1/P.E. of go, = 10-920, 
Agqy, 2/P.E. of 91,2 = 11812, 

Ags, 2o/P.E. of go. = 2°560. 


The deviations in the higher moment coefficients are at once seen to be markedly 
significant. But it will be noted that g,, as in the previous case does not differ so 
markedly in value from the normal as the odd moment coefficients. It seems there- 
fore likely, when a distribution is markedly skew, but the regression linear, that 
the even-even product-moment coefficients will not differ widely from the normal 
values, but that the even-odd ones will do so. It is possible that this is related to 
the fact that in distributions (such as 3 x 3 tables) which can be reduced in various 
ways to a tetrachoric table, correlation calculated from regression line diagonal 
cells is usually far more accurate than correlation calculated from non-regression 
line diagonal cells. 

Equations (x) to (xiii) are of value beyond the present illustrations. Further 
uses of the above formulae and tables are provided in a memoir on “Generalised 
Tchebycheff Theorems” which will shortly be published. We have to thank 
Dr Kirstine Smith for much help in the preparation of this paper. 
































THE CORRELATION COEFFICIENT OF A 
POLYCHORIC TABLE. 


By A. RITCHIE-SCOTT, B.Sc. 


$1. Lyrropvction. 


We have at our disposal a considerable number of methods for finding the 
coefficient of correlation between two characters from a table of frequencies. These 
methods may be summarily named and classified as follows: 


1. Product Moment. 


2. Tetrachoric r. 

3. Marginal centroids. 

4. Biserial r. 

5. Three Row 7. 

6. Variate difference methods including the correlation of grades and ranks. 
7. Equiprobable tetrachoric r. 

8. Mean contingency. 

9. Mean square contingency. 


Each of these methods has its own specially appropriate field of usefulness, but 
there still remains one class of table for which no entirely satisfactory methods have 
been devised, namely those which contain more than 2 x 2 cells and fewer than 
4 x 4, to which the tetrachoric and mean square contingency methods respectively 


may be applied. 


It was with a view to investigating satisfactory methods for such tables that the 
following work was undertaken. Such tables arise under many circumstances, 
particularly when we can. as in many psychological investigations which depend 
upon the instinctive judgment of some character, definitely assign individuals with 
pronounced characters to either end of a scale, but are compelled to relegate 
doubtful cases to an intermediate but somewhat indefinite category. We have, in 
a word, good, indifferent, bad; present, doubtful, absent—classifications resulting 
in a frequency table with three categories for one or both characters. 

In the present memoir a normal distribution has been assumed as it has been 
found to be not infrequently applicable and its assumption has given fairly satis- 
factory results even with distributions which are not strictly normal. 


§2. NoraTion. 


Let the normal surface (when standard deviations are used as units) 
a? +y* — Srzy 
~ 2(1-1°) 








94 The Correlation Coefficient of a Polychoric Table 


be divided as in the diagram by planes drawn parallel to the yz plane at.the points 
z=h,, x=h,... and by planes parallel to the zz plane at the points y = k,, 
y = k,.... Let the planes intersect in lines whose projections are the points (h,, k,), 
(hg, kz), ete., contracted to 11, 12, ete., where the first figure is the suffix of h and 
the second figure is the suffix of k. 










































































hy hg he ly ie 
i 
Ny Noy Ns) if No Ney | 
| | m.4 
k il 21 31 41! { 
1 | Meg 
Me Nee Ngo Nye Neg 
+M.g 
42 
ke . 12 22 32 
3 Nog N33 Nes 
is} 8 33 43 
ks = J = 
14 24 ieee es S Bore | ie 
| 
| — i ee 
| 
| 
2 a ee ee es rt, cea boa, 
i | | | 
N1q Nog F | Noa nN. 
k= —= | = = a , 
| | Ty 
My. | Ne My. | | | | Ny | N 
mM, . 
—____—; ee 
My. 
—EEE + auneeue 
Mz. 


The frequencies in the cells and the marginal totals are indicated by n,,, m2, 


etc., and ”,., Ng. ... %.1, Ng, ete., as shown in the diagram. 


C 


The quadrant in the position shown, viz. the left upper or (--) quadrant will be 


The surface may also be regarded as divided at each point 11, 12, ..., into four 
juadrants. One of these quadrants is shown by the dotted lines in the diagram. 
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regarded in what follows as the leading quadrant and its frequency denoted by m. 
Thus the quadrant shown will have the frequency m,3. 


In the ordinary scheme for a tetrachoric table, the quadrants are denoted by 
a, b, c, d, and when necessary these letters will be used with the appropriate suffix. 
Thus the division at the point s . would be represented as in the diagram, 











h, 
(mz) 
Ast b.. M.+ 
st | | 
ky \" | 
Cst | Dt | Ma. 
A IN eee: 
| ~ |i 
m,. | mM, N 





The marginal totals corresponding to the leading quadrant are denoted by 
m,., m., and the complementary totals by m,’., m.,’. 


Clearly any cell frequency may be expressed in terms of quadrant frequencies 
ince = 
“ine Msg = Mgp — Ms_y5 4 — Megs ty + Me_45 G-1- 


§ 3. ENNEACHORIC METHOD. 


In order to determine 7, since we assume the distribution to be normal and we 
know the marginal totals, only one more datum is required. This for example 
may be a frequency block (or the total frequency on a continuous system of cells). 
As special cases we have the “ briquette” or frequency on a rectangle of cells or 
again the quadrant frequency. The block may be the frequency contents of a 
group of corner, marginal or internal cells*. Consider (for future use) the general 
case of a quadrant frequency. 


m,. Mat 
Let = = Ty, a = 9 
N s‘0 N “02 
where ,r and ,@ are the tetrachoric coefficients. Then 
Met : 
: ee sTo 109 + 571 Ok + Tz Dor? + ... 


a ok Cee, eae on eet sameeren (1). 

In using Everitt’s Tables of the tetrachoric functions in which 7, and 6, must 
be less than 3} we must either rearrange the table or adjust the above formula for 
the position of the mean with regard to the quadrants. It is more convenient to 
adjust the formula as follows, dropping the suffixes as we are dealing with any point 
of division. 

* A “cell” is the least element for which the frequency is provided in tho original data. Cells grouped 
together for any working purposes are collectively termed a ‘‘ block.” 
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Let 1 mane To = Ts 1 = 9, = 6,'. 


Mean in a, 

d 
= Td) + 9, be 1 + N 
se To + A — 1 o 7) Oy oe 7, 6,'r + 7, 0,'r? aa coc 


= T) + 5 — 1 — (1 — t9) (1 — %) + & (7’, &, 7) 


m 
N 


= 79 + us (r’, 0’, r) OUR EC OUCR SO CSOOCIOSOOSOCSOOCOOTOSOOOOOeOe eS ee (2). 
Mean in b, 
m €c 
9 
= T) — (7.9) + 7,0,’ (— vr) — 7200 (— vr)? 4+...) 
= 7) — (1 — %) — T(r, B, — 7) 
ay PON, WF CEE Le pabdacdabibevevaccoudiesohvucegencivensns (3). 
Mean in ¢, 
m_» _ 4 
 Pieciede 
= Oy — (75'8y + 7/8; (— 7) + 72'8,' (— 7)? + ...) 
= 85 — (1 — 79) 9 — © (7’, 8, — r) 
Oe Say BIO Wi 0 OO x icadicgeditve sackaheskontitesaccietaineseenoned (4). 
Mean in d, ¥ eT, EIT) vs dvccceciecnccnncnceessacecias (5). 


In place of taking a quadrant we may take a marginal or internal block. I shal 
only consider the latter as the case of a marginal block may be deduced from that 
of an internal] block by removing one of the bounding planes to an infinite distance. 


In discussing the central block we in effect reduce any table to a 3 x 3 (ennea- 
choric) table as we consider it to be constituted of a central block (or group of cells) 
and 4 marginal and 4 corner blocks. I may therefore use the nomenclature for a 
3 x 3 table without any loss of generality. 

Nog = Moo — Myg — My, + My, 


Noo 


WV = 2To 0, +@ (27, 20, r)— 170 29, -f (7, 20, r) 
— 37919 — @ (sr, 9, 7) + 379199 + TF (x7, 19, 7) 


(sTo — 170) (299 — 195) + (97; — 171) (6, =o 191) r 
+ (272 — 172) (292 — 19) 7? + ete. 


I 


= + (g7) — 171) (29; — 191) 7 + (a7 — 72) (o82 — 182) 7? + ete. 


As an example of the rearrangement of the formula for computation consider 
the case when the mean is in Myo: 


. Ms , 
(mean in @) 5 = 975 oy + at’ 081 1 + ate’ Be’ 1? + ots 08,’ 7? + ..., 


N 
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a m 
(mean in 6) W = 479 2 + 37 2)’ 7 — Te 2fy' 7? + y7s 2,’ 7? — ..., 
. Me, r) 6 ’ 6, 72 ' 6. 73 
(mean in ¢) Ww 7 2701 o + 971 193 1 — gTq 10 1 + gs U3 1 — ..., 


: m 
(mean in d) WV = 479199 + 171 19) 7 + 17s 19 72 + Ts 105 72 +.... 


N, 1 
. V “» (Mog — Myo — Mg, + M3) 


_ Ny.Nag 





+ (aty’ — 173) (g9y’ — 191) 7 + (at 9" + x72) (92' + 192) 7? 

+ (sT3 — zTe) (e8g) — 185) £2 4+ CEC. «2... crececccccescscsceees (7). 
It will be noted that when one set of categories is symmetrical about the mean, 

i.e. when say 47,’ = 7; all the terms of odd degree in r vanish. This corresponds 

to the fact that symmetrical categories may be reversed without altering the 

numerical value of the marginal totals and their relation to the central frequency ; 

but such reversal will change the sign of r. 


§4. Sranparp DEVIATION oF ry BY ENNEACHORIC METHOD. 


We have now to determine the probable error of r found in this manner. 

Throughout what follows differentials will be used to indicate random sample 
variations, i.e. it is always supposed that the variations are small as compared to 
any quantity varied so that all the dn’s are small, or all the n’s are large quantities. 


hh, ke 
my = N | es s(x, y, 7) de. dy =f (h,, By, £) ...000.0scce0es (8). 


Since the variations of the means and the standard deviations are, in this form 
of m,,, involved in the variations of h, and k,, we have 


=. ; fn, , oF 
Ams: a Oh, dh, + Ok; dk, + a. dr $06 Seeseresoueeeee somhayved (9). 
Evaluating the differential coefficients, 
k,-th, 
of , ew 1 [Vinri yo 
aN | 20..yndy=w a. lv e ee (10). 


This is the area of that portion of the dichotomic plane z = h, which bounds 
the quadrant m,,. But the area of the whole dichotomic plane is 





+o , 
v{ z(hs, Y, r) dy = aoe MP = NH, ee Te (11), 
so that if we wiite = CO ee at ee (12), 
: k,-rh, 
] ie 
where A * nd AE TN cs ds hci oceveeGceeal (13), 


the factor A,, will be that fraction of the whole dichotomic plane section, which 
bounds the quadrant m,, and will have no dimensions. The value of A,, may be 
taken directly from Sheppard’s tables of the probability integral entering with 


Biometrika x11 
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the argument a > . It will be convenient to refer to this tabled integral as @, 
—f 
so that 
k, — th 
A, = @ (=) ecg arte 14). 
‘ /1 — 7? (14) 


It is convenient to note that with this notation 
@ (— z) =1- G (2). 
N he 


Further since m,. = Yon e~** dz, 
dm N 
ee on ee ome 
dh, n° WE 
dm,. 
and dh, = Nd,.’ 
of dm,. _ 
Hence ah, dh, = NA,,H,. = Bs piddceeis eek (15). 
Similarly SSS Rae (16), 
ok, 
h, — rk, 
h = MELD. Sancv sinters piresooueduetecoininds 17). 
where By, =€ (73) (17) 
of anh p& 
Lastly of St [s (x, y, r) dady 
i 
= N re [= z (x, y, r) dady 
rh, ky d? 
=N | rs Be dy z (x, y, r) dudy 
= Nz (h,, ki, r) 
WE Mii ia2 seca shaded acadbaraduskouskanbrenecavensaangrenns (18) 
which is the length of the ordinate at the point (s, t.). We may now write 
dmg, = Ag dm,. + Bygditiig + Yigg d¥....ccccrccccceccscveces (19), 
and — X27 = Aydm,. + Bypditiig — Uttigy 0... .c0ceeeeeeeceees (20). 


Considerable use will be made of this formula later and the following abbreviated 

notation will be used: 
Agdm,. + Bydim., — ding, = SP yp = — Yg¢ di ..seceeceeeceeeeee (21), 

and Mies FD cg Whig BF ioc cvs sevseviniccenenses (22). 

The reader must he careful to note that 5P,, is not dP,, but only a part of it, 
and this symbol is used here as at once a conventional abbreviation and a memoria 
technica. 

Since Nog = Mao — Myy — Mg, + My, 

“. Ungs = Ame — Amy, — dm, + dm 

= as Ay,dm,. — B,,dm., —— Xo dr a Ay, dm,. +" B,,dm., ob Xudr eeeeee 
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= (X22 — X12 — Xar + Xia) Ur = (Age — Agy) dmg. — (Ay, — Ay) dm. 
+ (Boy — By) dm.g — (Bay — By) dm.y — dingy .......cceeeeee (24). 
Reference to the diagram shows that A,.— A,, and the other coefficients are 
the proportions of the areas of the trapezettes bounding the briquette of volume 
Neg. These area smay be systematically named for the whole table thus, 


Ase 
st | 


ca Bet aE. 
that is, the areas of the planes meeting in the line of which the point s, t is the 
projection, in the direction shown, are named from the point so that 
A,, eS A,, t-1 = Ose, 
B,, 5 B,-1, . Bse- 
Hence we may write Agg — Ag, = O22; 
Ay, — Ay = O42, 
Bop 7 By. _- Boo, 
By 2 By = Box. 


If now we notice that since m,. = N — ng. etc., and m,. = n,., 


“. dm. = — dng., 
dm., = — dn.s, 
dm,.= dn,., 
dm.,= dn, 


we then have 


— (X11 — X12 — X01 + X22) Ur = — (aygdng. + ayodn,. + Bogdn.s + Bydn., + dnzy) 


Expanding this in terms of frequency volumes this becomes 
(Xar— X12 — Xar + X22) Ur = (242 + Bar) diy, + BayImg, + (ayo + Bo) dng, 
+ aygdiyy + Ate + Ago4Mgq + (Ay2 + Bye) mys + Bodies + (a92 + Bea) digs 
seeueeesl (26). 
It has already been shown by Pearson (Biometrika, vol. 1x, p. 1) that when 
random samples are taken from a population so large that its composition is not 
appreciably affected by removing the samples we have the following relations: 





hg aOR canescens (27), 
Mean (dn,,dniy) = — Tae -cekaedioittolapiomadas (28a), 
Mean (dn, .dn,;.) = — mere waabeueenal iapseonteaton (285), 
Mean (dn, .dn.y) = ty — ~* y 1 ee Eee (28¢), 


n 


Mean (dn,, dn.) = — —— Vee eRe y 
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Mean (dn,,dn,.) = — ae UE MN LR ake (28e), 
Mean (dn,,dn, .) = Ny (1 ~ 2) Whaat cla sapanerenton (28f). 
Hence squaring both sides, summing for all possible samples and dividing 


by the number of samples we have, 


(X11 — 


X12 — X21 + X20)" 97 = (aye + Bay)? 41 + Bos?Me + (Go2 + Box)? 33 


+ G49? Ry + Meg + Aze?Mgq + (212 + Boe)? M3 + Boo? eg + (a5 + Boe)? N33 


(G42 + Bor) M13 + Bor%ey + (age + Box) M31 2 
+ Gy2Njzo + Nye + AgNgo 


+ (ay2 + Bre) 13 + Bor Meg + (aap + Boe) Mee 


1 


N 


The expression within the large brackets 


Calling 
(X11 — 


+ (ag + Bor — m)* Ng; + (ay, — mM)? Myo + (1 — m)? Nez + (deg — Mm)? Ngo 
+ (ay2 + Boo — m)? Myg + (Bop — Mm)? M95 + (Ao9 + Bog — Mm)? Ngg 


= AygMy. + AggNg. + Boy M.y + BygM.g + Moo. 
this Nm we have 


X12 — Xe1 + X22)" O,? = (ayg + Bor — m)? My, + (By — m)? Ng, 


The following form for m is instructive although giving an apparently less 
symmetrical form than the above, 


1 
m = = 


y 


121+ + Agg%g- + BoyM.y + BgyM.g + Noe) 


1 
= (Ayo — Ay) my. + (Bog — Byg) (N — m.g) + (By, — By) m., 


+ (Bog — Byy) (N — m.9) + Meg — Mg, — My + mu} 


N 


{4rem. + By,m.,— My — Ayymy. — BymM., + My — Amy. 


— Bygm.g + Moy + Agi ms. + Byym.y — Mg, + N (Age — Agi + Boo — Bu)} 


= deg + Bae — 


= on +Pu- © 





Py — Py, — Pa t+ Poo 
N 


CORR e eee H EE EEE EEE HEHE EEEH EEE HEE SE THESE THEE EEE EES EHE SEH EEE ED 


We may then write 
(Xu — 


(a 





X12 — Xai + Xuz)* 9,” 
12 — 492 + Boi — Boo + 


*) My, + (0 — Gq + Boy — Boe + ¥ 
i (6n— Boe + #y M3 + (ars — Og + 0 — Boo + ¥\ 
+ (5 — gq + ; — Br + BY naa + (0 — Boe + ¥\' 


+, (ars — gq + BY ns + (0 — Age + *\' Neg + (®) 
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It will be shown in a later paper that the coefficients of the cell frequencies are 
functions of moments of the frequencies about the mean. 


If any relation between h,, h,, k,, k, makes 
X11 — Xie — Xe1 + X22 = 9, 
while the right side of the o,? equation remains finite the value of the expression for 
a,” will become infinite. x1; — X12 — Xe1 + X22 Obviously vanishes (1) when h, = h, or 
k, = kg, i.e. when either of the central categories becomes vanishingly small, and 


(2) when h, = —h,, k, = — k, and r=0, ie. when both sets of the extreme 
categories have equal frequencies and the correlation is very small. 


(1) Whenh,=h,. Then X11 = X12> X21 = X22. A212 = Agg = a Say, Bo = Boo = 0, 


Noy = Nog = Nog = 0, Ng. = 0. 
Then ee a (My. + My) _ = 
and the right side of the o,? equation reduces to zero giving the indeterminate form 
0 
0 for o,?. 


(2) When h, = —h,, kj = —k, andr=0. This case will be discussed in the 
next section. 


§5. SranparRD DeEvIATION OF ENNEACHORIC 7 IN SPECIAL CasEs. 


Two particular cases are of interest, (1) when r= 0, (2) when the table de- 
generates into a 2 x 2 table. 


(1) Whenr=0 A, = & (k,) = = 


By = @ (hy) =F, 














M.4Ms- m,.M. 
=< “W = — ast 
= Mz 

Mn —- M, n. 
Hence Ooo = Ass — Ag, = 5 ane i 
Meg — ™. n. 
ay = A,,— Ay = a 7 
No. 
Similarly Bes = Bar = a é 
P,, —P»—P i 
M = Ae + Bos — 11 a a1 + 499 


1 
—W (my, — My — My, + May) 
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Xu — X12 — Xa + X2e = N (A, K, — HK, — H,K, — A, K,) 
=N (A, aie H,) (K,— K;), 


and the right-hand side of the equation reduces to 
Mgo\* : Mag — Neg\? 
() (M41 + M43 + tig, + M3) + —— (a1 + M3) 


en = Ber . No. Noo\2 
+ (7B) (rae + ta) + (1-47 = "e+ uf 





=] 








Remembering that ae “2 = Mo this resolves into 
M,.N.g\? (N — Ng.) (N — 1.9) N- nar)! (57) Mee par _ 
("4") N + (—r*) (GH) FO 9 








+ (8) Cat) orn Ca) Cay a 


_ (N — ng.) (N — 0.9) Ng. Mog “2 {ng.N.g + (N — Ng.) Nog + (N —— Ng) No. 








N5 
+ (N — ng.) (N — n.9)} 
N,.(N — nq.) Nig (N — Nog) | 
io ee REP oe CT ee RIS eae ar WY (33). 
Hence 
N* (H, — Hy)* (K, — K,)* 0,3 = ny, Me) Sale tea) | © Sua et 
and ces Pas: ns (34) 


/N . N (A, - H,) (x, — K.) PoeeUUEEERECOSSOSOSSOOS 


This may also be expressed as follows: 





== Ras (1g + Mog + May + gq) 000 0000rrccccccvercerees (35) 





1 
Je Neo (ny, + N43 + Ng, -- N33) 
N (A, ea H) (K, — K,) Oceeeecesseceeseseees 


and 0, = 





When the extreme categories are nearly equal and r is zero H, = H, and K, = K, 
and the value for o,? becomes infinite. It is necessary then to keep second powers 
of differences to determine o,?. Keeping second order terms we huve: 


dm = F an + g dk + of dr + = 5 tae (dh)? + os (dk)? + a (dr)? 


ah ok Or Oh? Bye 
of 9 oF af 
+2555 (dhdk) +254 (dhdr) + oo (dkar)} ene (37). 


a2 
To find ahs etc. we may proceed as follows. 
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By straightforward differentiation we find that 











m 24h? — Qrhy 
Sam ah Moa e 20-7) ay} 
2442-2 2438 

=a ier ead ade mit a a0 
a (- hHA — >) PM LE eee ete VOGT oie (38), 
- “* ( ~ kKB — *) Oe Cea a a (39), 
- ex = a {r+ ! jos ae = ot 15 Set, desenseeea (40), 
ot x - Tae wi hintaan (41), 
| = x ee gen eek Ran ee (42), 

, 

yo as) 


Hence summing for all samples and dividing by the number of samples and 
denoting this operation by %, we have, since quantities of the first order dis- 
appear, an equation involving % (dh)?, & (dhdk), etc. We may determine these 
values thus: 


dm,. 
dh, => NH, 9 
m 
ee 
in = (1-4) 
% (dh,)? = % ( WE) MI ees (44). 
a.(i-9) 

Similarly S (dk,)? = SUR rere orion (45), 

Ms.Mag 

dm, dimes mst N 
= ne) me eee ees 46). 


To find (dhdr) we have 
— xdr = A,,dm,. + B,,.dm., — dmg, 


abo ve _ xdrdm,. 
. —xdrdh, = WH.’ 


. —XxNH, & (drdh,) = Ay, & (dm,.)? + By H (dm,.dm.,) — H (dm,,dm.,) 


= A,.m,. (1 -"r) + By (ma _ ma) — Mz, (1 - r) eivend (47). 


After simplification this becomes 


(1 = * P,, — By, (m., — Mg), 





























104 The Correlation Coefficient of a Polychoric Table 


i S@a)-— Nit; {(1 — 98) Py — Bm — mah oe (48). 
Similarly  % (drdk) = — NE; (2 2 *) Pim, my) ae (49). 


Equation (37) then becomes 
0=N (- hHA — id % (dh)? +N | - kKB — %) & (dk)? 











N 
h — kr) (k 
+ Gf 7h (an - 25 hat y . & (dhdr) 
k- = 
ee ag FO (Ghar) + By FB (GRAB) q0......ccrcvecstvcorcedessesscsccsveneses (50). 
2 ieee the values of the $’s, and transposing o,? we find 
{ (h — hr) (k — rh) x ( a *) 
x pi = y. 9 ee te ee 
tat tot = —N (nHA +7) STE 
Met 
, Met (1 = n) ; 
= p 4 tad | 
N (KKB +1X)—— re 
. h — rk 1 Ms. : 
+ 25—ax- WA (0 - 7) Py — Byy(m., — Mg) 
k—rh 1 Met 
25 BEX - Naty |(1 ~ yt) Pat — Aes (om. — mad} 
Ms.Mut 
Met a ae. 
+ 2x ° “Wik COR CC COO CEES CCC COR OEE COC DCC CCE SCOT OC CROCCO CCC CCCO CC eee (51) 


When r = 0 and h, = — hy, k, = — ky, x reduces to NHK, etc., as above and the 
equation reduces to 


. _ hms, Mgs km Mes 
— NHKhiko,? = — 3 (1 — r) it (1 = ) sen (52). 


If we take this equation for each of the four points 11, 12, 21, 22 and combine 
them according to the scheme m2. = m,, — My — Mg, + Mgq as before, the left 
member of the equation becomes 

— NHK (hy ky — hyky — hgk, + hake) 0,2 = — NHK (hk, + hyk, + hyky + hyk,) o, 
on — ENHEA,E,G,! .......00cccccseeseee (53). 


In a similar manner the right side reduces to 








h my. k m. " 
WH, (me + v Nan) + WK, (‘mn “+ WV nan) py cebet pace nee (54), 
which may be further simplified as follows 


wa mM. 
9 TF Pas “= Map 7 WN (m2 — 2m) 
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N 
M,.M.g Mo. 
i a (55). 
Similarly May + “T7 thgg = 2 IE csssstesseeeeesesnssssee (56). 
Hence substituting we have 

_— 4NHKh,k,o,? = 2 7 ie ™,. + ? ma} ee eeeeeresseee (57) 

“igen. Sai ky 
and o,? = INSHKM, k, if, Mm,.+ K, m.| sonarus onesie (58) 


(h, and k, are of course negative numbers) which remains finite when the extreme 
categories are equal to r = 0. 


(2) When we put h, = k, = © the enneachoric table degenerates into a tetra- 
choric table and we have hig = Mog = Ngs = O, 
| Ng, = Ngo = Ngg = O, 

Nz. = N.,g = 0,7 

X12 = X21 = Xe2 = 9, 

m = 219% + Bars + Map 











and N 
But Qy2 = Ay — Ay 
_ g@ (2—trh, (ky —rh\ _ 
-6(5—")-¢e (Ss) = 1- An. 
Similarly Bau = 1— By. 
Hence m= 224) %- + (L— Bu) a4 +N —m. — mat mu 
N 


N ~ (Ayn. + Byn.. — m3) 





N 
a 
a ee 3. 
? , P , 
* Xu?o? = (Gu — Ay— By + 1) My + (= = By) Mon 
2 P,,\2 
n (iy — Ang) Mat (FP) thee ooeeeeensseeteseceee (59). 


This form of the standard deviation of a tetrachoric correlation will be referred 
to again later, and will then be reduced to a still more symrietrical form. 


From the above it will be seen that we can determine the correlation coefficient 
from any frequency block, assuming a normal distribution, but the accuracy of 
this determination varies with the position and size of the frequency block. The 
probable error (-67449¢.) varies from cell to cell, and an unlucky choice of the work- 
ing cell may lead to a correlation coefficient with large probable error. A correction 
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of this latitude might in some cases be obtained by using another cell. But as the 
r from this cell would probably differ from that previously found, and as neither of 
them would be identical with that of the normal surface from which they are sup- 
posed to be sampled, we must find some means of approximating to the “‘ best” r. 
The most general method of doing this, following on the above, would seem to be to 
weight each of the frequency blocks and determine the weights so that the re- 
sulting probable error of the weighted r is 2 minimum. In doing this we must 
have regard to the fact that the variates we are dealing with are not independent 
but correlated. We must consider this method. 


Let the polychoric table have p rows and g columns so that the indices of the 
last row and column are 1g, 29, ... pq and pl, 2, ... pg respectively, and let each of 
the frequency volumes be weighted by an arbitrary weight w,,, W,2, ... indicated by 
the same suffix as their respective cells. 


Then Wy Nyy + WygMyo + «.- 
= WyyMyy + Wy (Myy — Myy) + Wey (Mg, — My) + Way (Meg — Myg — Mg, + My) + ... 
+H Wet (Mig — Mg_y,¢ — Mey, ty + Ms_y, 4-1) + «. 
= (Wy, — Wyy — Woy + We) My, + «2. + (Wet — Wess, t — We, 041 + West, 041) Mee + --- 


=> W444, + W19M4» + eee a Wot Meet Cee eeeeereeceveeseresesccccecccccrerecreseoccceceee (60). 
1 1 
Then N (04) My, + Wy2%y2 + «.-) = N (241 My, + WyQMy2 + «..) 
= Wi (sT 19 -L €1:) + Wie (:To 2 ae T,2) + Cee Seeseeteeees COs (61), 


which is an equation to find r, using @,, as an abbreviation for 
T1097 + Teer? + 573-1847? + ete. 
(Compare the usage in equations (2) to (5).) 


Mg, Mg), +-. Mp1, My, .«. ATE complete segments of the normal solid and are inde- 


pendent of 7, i.e. ie = ,Tyq%9, and these terms disappear from both sides of the 
equation. The w’s of the cells in the last row and last column of the tablemay therefore 
be dropped and we have (p — 1) (g — 1) w’s to determine. The probable error may 
be written down in the manner already shown and as there are (p — 1) (¢ — 1) 
independent frequencies there is sufficient data to determine the w’s so that the 


probable error is a minimum. 


There is no essential difficulty in carrying this out except that the coefficients 
rapidly become very cumbersome as tables increase beyond 3 x 3 for then the 
simplification dm,. = — dn., is no longer available. It will be found however that 
the same result may be derived more simply from the method discussed below. 


§ 6. Potycnoric MEeTHop. 


The frequency surface divided into p columns and q rows is divided at each 
point 11, 12 ... into four quadrants, and for each of these divisions a value for r, 
Viz., 71,» 712 ---, may be found by the tetrachoric method. These may be regarded 
as approximations to the true value of r, and their weighted mean found, the weights 
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being determined so that the probable error of the mean r so found shall be a 
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minimum. 
— Cyty + Chery, + «.- 
Let r= EY 6 (62). 
Then (Cir + Cie + wea) dr = Cy, dr, + Cid 49 + O00 eevccccccsccece (63). 


Squaring, summing for all possible values and dividing by the number of 





samples, (2C;,)?o,? =2Z (Cyt Ost)” + 22 (CoCyroosr Resp, v7) ncenereeeers (64), 
where Ost = OF,» Ry, st! = Tre rey 
Let S=% (C4405)? + 22 (CaCyr one Regt, ve’)> 
C=2% (C1). 
1 Then for a minimum 
| as Me 
dS = a0, 701 + aC, dy. +... = 0, 
dC => dC}, oS dC: -b coe — 0, 
(as as eps 
. (ae. ~ A) d0,, = (io be d) dr, as ae 
. C104;" + Cy201 02 Ry, 2 + C1393. 933 Ry, 15 +...=A, 
C4041 0,2 Ryy, 19 + C1309" + C'13012013 Rye 13 + oo = A eeeeeeees (65). 
as ee san cee eeeet 7 Sipe Se 
O11 P12 P13 
1 
| ag 1 Ry,19 Ry, 13 oe 
Au| ** (66) 
soll OR alae 4 
aa Ry, 12 1 Ryo,13 
| 1 
— Rar Ryo, 13 ] 
| P13 
AA 
Then C1104; — wi 
KAS... 
C1202 = = ps COC CCC Coe rererececenereeeeeeecCeCe (67) 
Since A is arbitrary we may put = (C,,) = 1, 
A 4 A Aga 
= (Cx) = — =— {om 4 ae 4 ee a 
Acooo 1 
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 & (C5401)? + 22 (CuCyroneyy Ry, vt’) 


1 / 
= Ka Bg 2 [On0m (S + Aout Bie + Aenea 
+ Aoost +... — “eno 








Ost 
By ey vot re en) 
ore 7 Sgione ue xa Mate NE tone me (71), 
and since XC,, = 1 o,2 = xu A iitietesesessssseseeenenessnsees asa 


§7. Comparison of PoLycHoric AND ENNEACHORIC COEFFICIENTS OF 
CORRELATION. 


We may now compare the polychoric coefficient (r,) with the enneachoric 
coefficient (r,) previously found. 


Since 2 (C51) — 1, 
ly = Cntu 4 Cy21 30 + Ce eorecccrvesccccesccccsosece (73), 
tle Dok. + Oeil, + 0g to... a, 
Xu X12 


ve ao (C,,dr1, + Cyd 14. a nee) 
= ou (Ay, dm. + By,dm., — dm) + on (A,.dm,. + By,dm., — dm,,) + ete. 


12 
Transposing and rearranging we find 


Cy 


11 





C 
dm, a dm, a eee 


C C 
= oa (A,,dm,. + By dm. + x47) + ig (Asem B,.dm..+X12412) + ete. 
1 


_ Cn (Fn gy, 4 Ye ap, 4 Lor ay,.) 4 G2 (Se ap, 4 See gp, 4. Yo 
Xu (a ta ee dry.) + (ar dhy + ape diy + 5 dria) + 


h, (ht 
where Su=N [ | z(x,y, r) dady = my, 


—c 


be ics. a es Ss 
My, + Myt ... = (179 199 + Dro) + (17 299 + Diz) + «.. 
Xu Xi X12 


Sy 
2 


using ©, as in equation (61), page 106, which is an equation of which 1, is a root. 
If in this equation we put Cy, = x11, Cys = — X12, Cor = — X21, Coo = X22, and the 
remaining C’s= 0 we have the equation for the enneachoric coefficient and r 
now appears to be the weighted mean of four tetrachoric 7’s, 


e 


yp, x X11" — Xia%12 — Xa" 21 + X22 x2 
a= fauna EE Mean A 





X11 — X12 — Xa1 + X22 
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Comparing this with the generalized equation (61) for enneachoric r previously 
found in which all the frequency volumes were weighted, we have 


Wy Ss 
Xu 
_ Cry 
=" 2 
X12 
C. 
st 
Ong =-——>, 
Xse 
and since Wet = Wet — Woi1,t — Ws, t41 + Wes, t415 


it can be easily shown* if there are p rows and g columns that 


Wet = Wet + We, t41 + Ws, toe + +++ Wega + Wear, t H Wsys, tra H +++ Wee1,¢-1 


Sere e eee eee eeeeeeeesees 


> Wy-1,t a Wy-2, t4+1 4- coe Wy Q ttt tet eeeneeeeneeres .(78), 


that is the sum of all the weights having the same suffixes as the points contained 
within the d quadrant of which n,, occupies the corner. 


* Consider a two-fold extension ruled and named in a mannér-similar to the polychoric scheme on 
page 94. 
Then My = N44 


M2 = M1 + M42 


Mgt = Nyy + Nyg + Nyy «+--+ +My 
+ Nqy + Ngg + Nog -+++-- +My 
+ Nyy + Ngg + Ngg «+00 + Ng 


Hence yyy; + W942 +... 


+ gy + Mgg + oeeeee thet). 
If we rearrange this in terms of n,,, ”4, we shall have 
Myy (yy + Wyg +--+. 000 st) 
+ Myq (Wg + Wyg +--+ st) 
= Wy Nyy + WygQMyq + «---+- WeytMgt + +2000 WpqNpq- 
It is clear that w,, will be the sum of the w’s belonging to all the m’s of which n, is a constituent, 
that is, from the figure, all the n’s whose boundary lines lie beyond the lines h=s — 1, k=t —1, ice. 
04 =2 uy. 
Pa 
The relation Neg = Mogg — Mga « ¢ — Myo py + My_y + 1-4 
may be compared to the partial finite difference in two variables 
AA, ,, y= Urs, v4 — Uzi, y- Ux, v4 t+ Uz,» 
which may help to make the above relation clearer. 
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§ 8. CoMPUTATION OF R’s AND o's. 


As the computation of 7, even for an enneachoric table is somewhat lengthy, it 
is necessary to have a definite scheme to work to. In addition to this the values of 
the R’s when resolved into their constituents present some interesting features. 


A new expression for tetrachoric r has already been deduced from the degenera- 
tion of an enneachoric table. The following is a derivation of it directly and in a 
more symmetrical form. 


Consider the tetrachoric table 

















D 
a b Ney 
F E L 
c d Neo 
G 
Ny. he. N 








Let A and B have the same significance as before, i.e. A is the fraction which 
the area of the plane DE is of the whole dichotomic plane and B the same for FE, 


and write SP eM, eM Stic WE evectnssseraeceversastncien (79), 
where the a suffix is used to indicate that it is the P of the leading or a quadrant. 


Then since the fractional area of HG will be 1— A and of EL, 1—B, the 
corresponding P for 6 quadrant will be 


oP = Ang. + (1— B)n.,—b 
= A(N —n,.) + (1 — B)n., — (n.. — a) 
= AN — (An,. + Bn., — a) 


EEN POLI <cicaideviuebavdbsbiadastoncibeucarvaeckcasa gonna’ (80). 

Similarly BE ee be apn eee ee One Ie eae € (81), 

Poy ES Bee eee (82) 

Hence we have ge a a A AN sac vecntoicesivacsess ets (82) bis. 
We have already seen (20) that 

-~ xdr a Adm, + Bdn.g — ditty, = 8 gP yy ccecescsccscevcees (83). 


‘ 


Using the symbol % as before to denote the operation 
samples and divide by the number of samples” * we have 


‘sum for all possible 


* It would be useful to have a distinctive name for this operation, verb as well as substantive. 











IS 
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x*o, = S (8,P)? 
= & (Adm,. + Bdm., — dm,;)? 


= A’m,. + B’m., + my + 2ABm,, — 2Am, — 2Bm, — 4 
= A? (a+ 0) + BY(a+b)+a+ (24B~ 24 — 2B)a— 
2 
= (4+ B-1)'a+ 4+ BO 4 Rt tice oe oa eal a (84). 
But 
ea =A(at+cec)+ B(a+b)—a=(A+ B-—1)a+ Ac+ B+ 0.d...... (85) 
and a+b+c+d=N, 
. %(8,P)? 
= oP\* ary’ * a\s ae \* 
-(4+B-1-%) a+(B-%) b+(4-%) c+ (0-%,) d 
ey ee \? ~P 2? »P\? oe \* 
= (-%) + (G7) &+ () o+ (-%) 4 
a a {Pa + ,P2b + Pe + Pr} MRSS a DARA (86). 
Further (— gP)a+(,P)b+(,P)ce+(—.P)a 


=a{N (A+ B—1)—,P}4+ )(BN —,P)+c(AN —,P) +d(—-,P) 
=N {4+ B-—1)a4+ Bb+ Ac}—N,P 
= N{A(a+c)+ B(a+b)—a}—N,P 


The above form (86) of the square of the standard deviation (omitting factor) 
is interesting 2s involving only the squares of the P’s. Since the P’s are connected 
by the relation (82) bis and (87) their values may be determined from any two of 
them. 


The R’s. 
Since — Xe SOs: = SPs 
and — xvv Srey = bP yy, 
XetXe' Ot Oey = OPySP yy 
and XstXe'’ Ft Fe'r’ Rot. vy = D (SP SPs) = Soe. gy (Bay). 


In conformity with this notation 
Xst* Oa" Fe Soe. st- 
It is useful to have a verbal rule for writing down such mean products as 
> (8P.5P yy). 

The following will serve. 

Multiply the detached coefficients of the differentials in the $P’s as in ordinary 
multiplication; strike out the products in which the related frequencies have no 
common frequency and insert the common part of the frequencies after the related 
coefficients. From the whole subtract the full products of the P’s divided by the 
total frequency. This may be proved as follows: 
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Let p and q be any frequencies in a given distribution in which the population 
N is so large that sampling does not alter its composition; then we have the well- 
known results (p. 99) 


o#-p(1-§), 
% (ipaq) = - 4 
Now let p and g have a common part c so that 
p=p rte, 
q=q te. 
Then & (dpdq) = Fd (p' +c). d (q' + ¢) 


= % (dp'dq’ + dp’de + dq'de + dc?) 





ll 


a mae 
__@+9@'+9,, 
N 

he, 


Now the mean product of any two linear functions of p’s and q’s, 
S = Bd (tp, + tee + -.-)-d (hangs + hago + ---), 
will consist of the sum of the mean products of terms such as 
i,dp, . k,dq;. 
But H (i,dp, . ky dqs) = isk, ® (dp, - Aq1) 
= isk; (ca = Pei , 
where c is the common part of p, and q;. 


Therefore S= li k; (cu _ ot) 


= Di,kiCe — Digk, Pets 
Hence the rule. ; 
As an example consider S,,;.9;, 
Siro = > (8P1,5P.;) 
= $ (A,,dm,. + By, dm., — dm) (Aqdm,. + Byydm., — dmg) 
= Ay, Ay my. + Ay Boy my, — Ay my + By Ag mg, + By Byym., — By mg, 
~ Ag my — Byymy + My — Puls 
= (4, + By — 1) (Ada, + By — 1) my + By (Aer + Bay — 1) ny + By, Bans; 
+ Ay, Ag, 112 +0. Ag,.Noe + 0.0. Mg. 
+ Ay Ag 43 + 0. Agy.Mo3 +0.0. M55 
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1 ( (Ay, + By — 1) my + By Mg + Buna) 

—_ rai Ay N42 -|- 0 e Noe + 0 . N39 | 
a Ay N43 + 0. Meg + 0. Nog 

(Ag, + Byy — 1) my + (Ae, + By — 1) No, + Bang, 
“Sin te Ag Mq + Agi Moo + 0. Ngo 
at Ag, M33 + Agi Mog + 0. Nog 
P P 
= (4n +8,-i=- W) (42 + B,—1- 7) 


P. PN. 7 
Bo (Bn = 7) (4m + By, —1- =| Noy + (Bu = 7) (Ba - 7) Nsy 


/ 


P P P; P 
7. (4n = =) (4m = *) Myo + ( ie 7) (40 = ¥) Noe 
Py; P P3,\ / P 
+ (0 sie ¥) (0 ad W) M32 + (4n re +) (da = #) Ny3 


P P Pi\/, P 
r (0 r =) (40 Pa =) rae ( a 7) 0— +) Weck ne (88). 


In the above the P’s are ,P’s and remembering that 
D 
a 1 nie af 11 
a. a Ce 


ay + By -1- WV W 


etc., 


we have 


P P,.\ Pu\{ oP Pu / 

Sun (—*y") (Sy) + (Sy) (- Sa) me + (SP) 
Pa\ (oP Pu\ (oP: P 

+ Cy") Ct) met (oy) (Cat) ee + (=) 


Px) /»P Puy (oP car 
4 ey (2) . (- 7) (57) ee (- z 


@P yy aP M1 — ¢PiaP 21% + ePurcP 2131 





1 
bs wt 
+ pPyypP 21 (Mg + M413) — oP rivP 21 (Mee + M23) + oP rraP 21 (M32 + M33)} 


The relation between the coefficients in the above expression is very simple. 
We have already seen that 


N*x%o,2 = a (— gP)? + 6 (-P)? + ¢ (pP)? + d (— gP)?...--- eee (90). 
In the quadrants of a tetrachoric table write the P coefficients. Thus 
— ~P pe Pa 
5 P| 7x aP 


The a frequency is related to — gP, etc. 

Consider now the empty scheme of an enneachoric table regarded as a tetrachoric 
table with the point of division first at, say, 11 and second at 12, and write in the P 
coefficients as above. 


Biometrika xu 
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Divided at 11. 

















—aPy | Pu Py 

=—=—=====_= Reaey — — 
Puy = Pa | - Pa 
Pa | —ePu | - Pa 


bee Tic ie | Px 











ewe prs | ni 








! 
oP Poy | — Px 

Now if we superpose these two schemes upon an enneachoric table with a 

frequency in each cell, each cell will then contain the P coefficient and frequency 


of each term of the expansion of S,,.> . the omission of the factor xe) thus 


(P11) (— aPe1) Me | (c Pu) (-Po1) 31 


— aPy) (— Pau) 1 





(P11) G P23) Nye | (- aP 33) (Px) Noo | (—«¢ Pu) (— aPe1) Nee 


(oP 11) (oPe1) M13 | (— oP) (oP o1) %g | (— oP ae - Px) Ngg 


When 11 coincides with 21, R becomes = 1 and the mean product degenerates 
into the square of the standard deviation. 





This may be summarised in the following table in which the letter, a, 6, etc. 
gives the suffix and the sign gives the sign of the P required: 


Py | Px Pr 








hes z cas 
M14 -d | -d -d -d 
| N12 +b -d +b -d 
N15 +b +b +b +b 
Ne} +€ +¢ -d -d 
Nee -a +¢ +b -d 
Nog | -a | -a +b +b 
Ms, | +¢ +c +¢ +¢ 
Ns | -a +¢ -a +€ 
tes -a -a -a -a 





Thus the coefficient of 35 in Syo-9; 18 (+ -Py2) (— ¢P ro). 
This table is sufficient for a polychoric table of any size since any two cross 
points st . s’t’ in the table, with the planes through them divide it into nine portions 
or groups of cells, each of which is represented by one of the above cells. 
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The relations between two superimposed tetrachoric divisions involve the deter- 
mination of ten constants, four o’s, 0,;, 042, O21, 22, and six R’s, Ry,..2, Ryo; 
Ryy-22, Ryz-21, Ryg-22- Rey-22. The o’s follow the example already given, the proper 
suffixes being attached. The value of S,,.., has already been given. The remaining 
five S’s are as follows: 


1 
Sie = W2 {aPuaP 12% + cP y1cP 12 (May + 31) — oPrraP iz Me — oP 11cP 12 (Mee + M32) 
PiuvP 12 N13 + ne % (No3 ob Ns3)} OUR EC ESEOSOOCOOOO OSES) (91). 


Z : 
Si1-22 = w2taPuraPoe M11 — cP iaP 22 M21 + cPircP 22 M21 — oPiraP 22 M12 + oP 11aP 22 Mee 
— aP1¢P 2932 + bP irpP oe M13 — aP119P 92 M23 + aP 11aP 29 Ng3} «++ +++++s (92). 


Si2-01 = witPele M41 — cP ygaP 21 Mar + cP izcP 21 M31 — aPi26P 21 Me + cP i2nP 21 Nee 
— ¢PyeqP 21 M32 + oPr25P 21 M13 — aPrevP 21 M3 + oP izaP 21 M33} +++ ++ (93). 


Si9-22 = Ws {aP r2aP 20 (M1 + M12) — ePizaP 22 (M21 + Mee) + cPrecP 22 (M31 + M32) 
+ oPi2xP 2213 — aPi2vP 22 M23 + aPi2qP 22 (Mag)}-----eeee seen (94). 


1 
Soi-22 = yi laPoraP 22 (M31 + Mg) + Par eP 22 M31 — oP or aP 22 (M12 + M22) — a arch 22 N32 
+ pPaypP os (ig + 93) + oP eral on Mas} +2 sescccecsccccceees (95). 


A more convenient form of the above for actual computation purposes will be 
found on page 120. 


We may now by means of the P’s express the standard deviation 7, in a form 
consisting of sums of squares. 
~ (3C,,) dr == ( SP.) pSiteinatabad ee ces" (96), 


"BP u)t -2( ‘y Sst, or ay, Cs Coe — Sse, st’ 
Xst 


. (ZC,,)2e,2 = = ( 
( We # Xst XstXs't’ 


_ Cn (“ Crs Cis 
Suu t+ —* Si. + —2 Sis t - ) 
Xu \Xu 11,11 Xie 11,12 X13 11,138 


Cie Cis \ 
ia ee + ae S212 + cay Bok teh 055 PE WOR. cavensaccutvees (97). 
Now consider the S’s to be expanded in terms of the frequencies and pick out 
all the coefficients of the frequency n,, say. The coefficient of the n,, taken from 
Sim, vm 8ay will be : Pym.: Pym in which the quadrant suffixes will be determined 
by the relative position of n,, to 1m and I’m’. Let these undetermined P’s be de- 
noted by p. We shall then have as the complete coefficient of n,, 
Cu 


Ci; ae C 
(=2 Pu-Pu Xi Pit Pas +—# Pu-Pi3 + aa 
X11 X12 X13 


Cre (<x Cy. Cis ) 
+ a Payee 2-Pi2 + —~ Pyo- Pig + «.. } + ete. 
r X12 \X11 Pu-Pi2 Xi Pie X13 2? Pis 


Since we are dealing throughout with the cell n,, the quadrantal suffix (i.e. the 
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a, b, etc.) for any J, m will be the same throughout. Hence we may write the 
complete coefficient of n,, as 





Cu Cie Cis 
+—= +— +... 
Xu Pu Xu X12 Pas X13 as ) 
Ci (cu Cie Cis ) 
2 — +— += +...) + ete. 
X12 Pu Xu fa X12 ve X13 Pas 
=(—Uy+ a PS SS ee eee 98). 
Co Pu X12 we X13 re (98) 


In the case of an enneachoric table for example we have, XC,, being = 1, 


C C C C 3 
of = (— <4 P ep Se oe Pas) 
Stes” 11 ta” 12 i." 21 i, 22 11 
C. C CG, C. 2 
+( Pa — Put Pu — 52 Pu) Ny2 
C. C C C. 3 
+ ( x, ut xt a Po, + oP) M3 + CCC. «..... (99), 
21 2 


the P’s being at once written down from the table on page 114. 
Or more generally thus: 


Since the P of any cell ,,, with reference to any cross point (st) is invariable it 
may be written generally as ,,P,,. This notation gives up the recognition of the 
equality of the P’s in any given quadrant but gains in generality. The quadrantal 
suffix and sign may be supplied by inspection. We have then the following lemma: 

Sst,0¢ es S (8P:5P sv) : 
= PsterrP yey Mi t+ oP stoarP ey tar + +. + ePstereP yy Mis + -. 


— >> (1m LP stim? s't’ Mn) PO ee ee ROO USUESOOSOOSOOSOOSOSCOS See eee) (100). 
im 


The standard deviation of r, may then be developed as follows: 


(ZC) dt#=& (CyOr52) 





“aCe 5P,5P ye) 
XstXs't’ 


s. (2C,)*o,2 = = (x SP,,) + 25 ( 
| Cue Cye ~~ on mP ert | Nim 


C,\? 
{x (- ) ae + 2 


st Xst Xs't’ 


sa.) Ma + vas ooh OS). 
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§ 9. THe Stanparp DeviaTION OF PoLYCHORIC r IN SPECIAL CASES. 


The value of the standard deviation when r = 0 is of interest and may be got 
as follows. 


Assuming r = 0 throughout: 





Ay =€ (ky) ee WV 
By =& (A;) aid a , 


and writing m’,. = N — m,. and m’., = N — m.,, then 
8 8 t t 





Ms .M.t 
wn = —_— , 
N 
, 
mM’ ..Met 
oP st ra N ’ 
Pe 
, 
M,.M' 4 
Pe=—y 
N 
P m’,.m’.: 
ee  -lae 
N 
Substituting these values in the S’s we have after reduction 
pees am! 104 
LAL = FH Ma Mey MM oy evererererrerereerreens (104), 
819-1, = wa Wi aN BE de Snccirsatronssosecnveed (105), 
, , 
85: -9, = JS Meas Mg My veeereereresereeeeeeeees (106), 
1 — - 
Sso-29 = NS Mae Mag Mg .Mi ag veveereereeeereeeererees (107), 
oe e ‘a! 108 
Syre1e = Jye Ma Mey MyM ag veverrcereeerserrereees (108), 
ee — 
813-5) = NS MereMoy Mg My vevveeeereeererersenen (109), 
Sye-0), = 813-23 = Ni i NN AN ce. dkciivchernsnsncie (110), 
S12-23 = na i Mgt GOW ce. werck ccianscerien nee (111), 
S a ] ’ ’ 112 
Sar-ag = jpg Mae ey M gM og ceeveeseerererereerseres (112). 


From these values we get the o’s and R’s, 


1 jf Sees 1 ) : 
= a  — J. eee eee 113), 
“uu ~ N?H, RK, N ( qu sia 
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ey fj eee =i" 

a= WK. 7 (- = Sans (114), 
may Ba eee ie 

on yRHCK : =) raanted (115), 
1 _, /Mamgm'sm's (_ 1 

on WFHCK, / 7 (- ) ‘Siete (116). 


Since m,.m’., = m,. (N — m,.) = Nom,?. 
_ Lg Ouse, 
wa Wie ae, * 
and similarly for the others 


ate Se 
mm’. 
aeeeey i ~ /%s 2 Oak cee 117 
11°12 21°22 Mm’ Me (= e) (117), 
Seer, 
mM;.M’s. . 
Ruen = (ESS == Pg — (= € ) ee oerereccceces (118), 
P Aor 
m’,.m’.,m’,.m’ 
eer Nd =/! ll a) al 7 Se (119). 
sets ala m’,.™’ ..Mg.Meg ) 


With these values we have 





| 1 Qi Giz Yor Ye | 
e; ft « € « 
Sette 6: © Pe hoes (120). 


i. © am Ths 

| Wn ce OUelCe:Cd | 

Aooo is symmetrical with respect to the centre of the square, hence* 
ee SES Le ee ee a ie 


ete l+e’ | |e-e l—ee’ m’,.m’ .yMg.Mg 


The remaining minors are easily reduced and we have after reduction and sub- 
stitution 


Mooi AoA (1 se <2) (1 = <2) N5 _ A, (4, oa m- H,) K, (K, a 3 K,) 
“2 





on m,.m'. Ms. MyM’ 

Bb eaccepoces (122), 
A ae: M;. K WE ios ; 
ee G-a— oe 2 (e, - Sm) (eK) 
O12 M,.M';. Mo MgM’ og \M oy 

wrSbeeehesee (123), 
A Seg! 4 Me K : hs + 
—0} = = (1 — €*) (1 -- «’#) —* - ee H, — H,) ~—, (K, - "= Ky) 
Oo Ms.M's. \M'y. MyM’ oy Meo 

deiieacec ene (124), 
y ea ; a /m'». K We ca : 
ott — — (1 — of) (1) 8 (7 H — Hy) (8 (7? Ki - Bi) 
O29 Ms.M',. \M's. MyM’ oo \M' oy 

spp eetenests (125) 

* See Scott and Mathews, Theory of Determinants (2nd ed.), p. 89. 
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Summing these four latter expressions we have 


H,2 2H,H H,? 
bs ets, eee oie pg 5 1 o 2 2 
A Aoooo (1 ad ) (1 ai ) = tne a Mz.M',. M2.M } 
10 on Bee. 
x i 


‘ , 
oyM'.y M.gM’.y Mm’. 








A 
e ae 0000 
tr Aoooo — A F ss 
a Mgo/m',.m’ .{My.M.g 
ne | A? _—‘2H,H, Hi,’ bx KK? 2K, Kk, 1 kK, 


M.'s.  Mg.M',. Mz.M’s. M.yM'.y = M.gm’.. —- M.gM’ 5 























a Noo 
Ne |i? ™: — 28H + Hy Ee} x [ice St — 2K + KS 
* ie, Sass * as * Sa 142 1 ms 
meee (127). 


When the table is symmetrically divided in both categories and r = 0 we have 
Ms. = m',., H, = H,, K, = Kz, etc. and the above reduces to 








; N?H?K2 .4 (Ts — 1) (= - 1) 4N2H2K2 "2: M2 4N* H? K? 
: My. Mey Ny. ; Ney 
and o, = ae id at eae Nida Staal (128) 


§ 10. CoMPARISON OF THE STANDARD DEVIATIONS OF PoLYCHORIC r 
AND ENNEACHORIC fr. 


We may now compare the standard deviations of r, and r,. 


= (xu = Xie Xa + X22) or, = SP, — dP 1. —_ oP. 4- dP. cocccccecccecce (129), 
— (Cy, + Crp + Coy + C'x2) dr, = Cu §P3, — Cu sp,, Cu dP., + Cos 5P 2. 
X11 X12 X21 22 (130) 


from which it appears as before (p. 108) that the enneachoric r is equivalent to a 
polychoric r in which the weights of the r’s are x11, — x12, — X21» X22) i-€- 
yp, — Xu" Xie — Xa" + X22722 
: X11 — X12 — X21 + X22 
Hence also the standard deviation of the enneachoric r may be written 
a,” = (— eu -E aPi2 + gen — apes)” Ny + etc. fee teetenees (132). 
Upon expansion this reduces to 
ate ea ag lewis i) + ae 
: + gPo — N (Ag + Buy 1) — P22 + N (Age + Byp—1)] ™ ; 
aad Ne {(Ae ee A,,) 7 (Axe ar Ay) + (Boe i ink + ete. 
— (By — By)} — (Pu — Piz — Por + Pos) ” 





P,, — Pig — Poy + Ps) |? 
= [wv {on — A120 + Bos na Boy a Puc Pn Pa Pa Ny + ete. ...(133) 


(using a, B in the sense of p. 99), which is identical with the corresponding 
coefficient in o,? as given in equation (30). 
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§ 11. FormuLAE FoR CoMPUTATION. 


The forms found for S are not convenient for computation. The foliowing have 
been found more expeditious. Various other formulae are also collected for reference : 


yb ki ~ Srl 


= Nzth..k Be 8S 2(1 -r?) 
Xst = Nz (h,, k;, 1) a? cme ithewd aes oi (134), 
k, — rh, 
1-0 (4%), 
? V1—?7 





mn G a ) 


IT, ne Ag + By, = 4, 
Py, = Agm,. + Bym., — Mg, 
when no quadrantal suffix is used a is understood. 
2 
= 2 4 2 2 5 
Sri<43 — iPr Ay + Ay Cy aa By, by — “Sp ceccccccccccece (135), 
and similarly for Sj...) etc., 


Suey = Uy Whey + By Bye (mo, + M33) 








+ Ay TTj27%15 PuPy. 2 
ar oe BETO | Wiglstacdlalely Naga Wieldleee seambenee 6), 
+ Ay Ay2M15 N (136) 
Si1-21 = Ty, T0321 + By yy My + By Bong, 
+ Ay Tem. = Py Po ; 
+ Ba thates No irtetteesteseseesteeeeeees (137), 
Sireoo= Thy Mey + By TWy2%) + By Bons, 
+ Ay; To2%2 Py, Pro 
_ Fo Wet wesnpeutecsacaccascerete 138), 
+ Ay Ag2%13 N — 
Sio-21 = TT 2 To; 41 + By Tg) Me) + Byy Bay N33 
+ Typ Aer M2 + By2 Ao N22 a Py2P 2 (139) 
+ Ay, AoiM13 j (139), 
Sj2-22 = Tyo Too (41 + M2) + Bye My (Mg, + Meg) + By Bag (N31 + N32) 
PF, 
+- A, Ase3 — a Welles bai dhcias vied matseecek (140), 
So1-22 = I1,, [loo (M1 + M21) + Baggy 
+ Aggy Hog (12 + M22) Pe” Wer or eee a (141), 
+ Ag; Age (M3 + M3) N- 
‘a = = §,,. *ste 
; 
e Ost real ta os 2 
Xst 
XstXs'' Fst Fer Repose’ = Sotes't’ PTEETELITEIIL ITE TTT TTT (142), 
i R mys Sstes't’ 
; a XstFstXs't'Fs't’ 
Sstest’ 


== Yo es 
V Seto See -e't 
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In place of calculating o and R it will be found easier to employ the S’s directly 
by writing the equations for C in the form 





Cy Suen a Cie Stun os Cis Suns +...=A 
— ee BOER. nS sence (144). 
C Ail? C 12°12 a C ci Ee oe =A 
™ X11Xi2 * xa" ® X1X1s 


Eliminate the A by subtracting one equation from each of the others; put 
C,, = 1 and solve by successive elimination for the remaining C’s. This is preferable 
to using the determinant as it is at least no more laborious and lends itself to various 
checks for accuracy. The A should be determined from 2ach of the equations as a 


further check. Then we have x 


raad:3 : 
ial Y 9 Jacewocscoweeaees (145). 


By putting Cy, = xu, Cha = — X12, Cor = — Xe1» Coe = Xo2, We May derive the 
enneachoric standard deviation from the polychoric 7 in a form convenient for 
computation in terms of the S’s, 

(X11 — Xaz — Xa1 + Xez)*Or,” = Syrera + Syoer2 + Sey-21 + Sooea2 + 2 (Six-22 + Sie-21) 
— 2 Sqrerz + Sir-21 + Syo-22 + Sor-2g)------ (146). 


§ 12. CoMPARATIVE RESULTS OF VARIOUS METHODS OF FINDING 7 FROM A 
3 xX 3 TABLE. 

In testing the methods developed in this paper upon actual material it was 
thought desirable to try them side by side with all the other methods of finding 
the correlation coefficient so that some indication could be got of their comparative 
accuracy. Each of the tables was therefore dealt with by nine methods which are 
indicated in § 13. These tables were selected at the beginning of the investigation, 
and had the course which the research has taken been foreseen probably a different 
selection might have been made. Two of them, I and III, are normal tables with an 
arbitrary population of 1000. In Table I the frequencies have been taken to the 
nearest integer and in III to the nearest two places of decimals, so that any irregu- 
larity in them is due to the roughness of the approximation to the true figures. 
In the r,, we have an additional lack of approximation in taking r, from the curve* 
for determining r, and also in r,, r, and r, from finding the class index correlation 
from a small number of marginal groups. In IT and IV we have actual samples. 

A rough test of the value of the various methods may be made by finding 
the mean square deviation of the calculated from the “observed” value of r, each 
constituent being merely weighted with its total frequency, regarding the product 
moment values of r as the “observed” value. 

Thus let n,, n. = total frequency in Tables I, II; R,, R, = product moment 
value of the correlation coefficient in Tables I, IT; r,, r. = correlation coefficient 
calculated by one of the methods, then writing 

(my + Ng + ...) 2? = ny (Ry — 71)? + ng (Ry — 72)? + «.., 
we shall have in &? a measure of the goodness of the various methods. This gives 
the following values of &*. 


* Tables for Statisticians and Biometricians, p. \vii and p. 65. 
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Mean weighted square deviation of calculated from “ observed” or product 
moment values of r. 








=? =? (omitting H) 
Mean contingency asi vee ne % -00138 -00102 
Mean squzre asieitati nat iin ea -00089 00060 
Enneachoricr... Se aa ee -00364 -00036 
Polychoric r wee ws ae we -00004 -00002 
Tetrachoric r ee +e Res cone -00018 -00016 
Mean tetrachoric r é a ray “00005 00003 
Mean weighted tetrachoricr ... eae -00002 -00002 
Three row n from mean dispersion* ... nm -00020 -00019 
Three rown from “individual” dispersion 1, 00151 00144 
Marginal centroids os ise Soh ie 00215 00255 

















I have given the value of &? including and omitting Table H, which gives very 
anomalous results, as yet unexplained. Broadly the best results are given by Ty, 
Tm, and r,, and, Table H aside, the best result is by r,. In the case of r, the results 
are not quite satisfactory. The figure given was arrived at by taking the mean of 
the raw figure from the curve and the same corrected for broad categories as 
suggested in Tables for Statisticians and Biometricians. An attempt was made to 
find an empirical formula which would give better results with the tables here de-! 
scribed, but the result was not worthy of record. With three row 7, although strictly 
the method is quite inapplicable to 3x3 tables, it may be useful to notice that 
when so applied the best results on the whole were got from assuming the 
distribution to be homoscedastic and using the mean dispersion of the arrays. This 
was largely due to several of the tables being divided so that some of the arrays 
contained very small frequencies which had therefore large probable errors, giving 
an undue effect on the result when squared. When such small frequencies are 
avoided the results appear to be about equally good. Of course our theory fails, 
as we have already pointed out, when any cell frequency is of the same order as 
its variation. 

Comparing the probable errors of r,,, 7,, and 7, (tabulated for convenience in the 
Appendix on page 133) it will be seen that on the whole they are in descending order 
of magnitude. They differ very little from each other and, considering the labour 
involved in finding r,, 7,,, would in most cases give a result with a sufficiently low 
probable error. 

The method of marginal centroids as already known is unsuited for tables with 
so few categories. 

An interesting and important relation which is not shown in the tables of 
numerical results (§ 13) is the degree of correlation between 7,,, 712, etc., viz. 
Ry; - 12, Ry, - 3, ete. These are collected in the table on p. 123. 

All the enneachoric tables are arranged so that reading from n,, to the right, and 
downwards, r is positive so that the values R may be compared among each other. 
It will be seen on examination that Ry, . 12, Ry. 21, Rig. 22, Rei - 92 are on the whole 
greater than R,, . o, and Rj, . 2, and of the two latter R,, . 2; is usually the greater. 


* See § 13, 8. 
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Ruewe |. Buen Ryz- 22 Riy-o2 Ryy-22 Rys-21 
| 

A +3418 3041 +3196 -3450 | 0608 -1466 
B 6977 -6040 *6132 -7050 -4048 5231 
C -4880 +4550 4751 -4821 1872 +2614 
D 5100 5678 +6038 -4907 +2620 3221 
E -2180 4953 -5378 1813 0830 1217 
F +4252 +4422 -4133 4307 -1209 +2392 
G 6633 6723 6565 6701 +3414 -4769 
H 4395 6592 -6567 4367 -2391 +3219 
K 3842 3488 3656 +3882 -1124 +1632 
L 5014 4831 -4798 5162 2276 -2611 
M +5282 2820 -2961 5213 0865 -2128 
Pairs of brothers 8203 8203 8732 8732 *8837 8756 











With regard to the computation of r, it will be seen from example appended to 
Table A that the amount of labour involved in dealing even with a 3 x 3 table is 
considerable and will rapidly increase with the number of cells, and it is very 
desirable that some short method of approximating to the weights (C’s) of the 7’s 
be devised. For the present it may be of interest to give here the C’s for the various 










tables used. 


























Cy Cy Cn Cre 
A 1 -51333 *38587 -71892 
B 1 | -26872 *19654 -55240 
Cc 1 27000 26993 -34266 
D 1 *65737 -15786 *85445 
E 1 3-79397 -16186 3°81496 
F 1 +56452 -43667 *72594 
G 1 —-02973 —-13835 -69769 
H 1 -72919 *33838 -79708 
K 1 -59665 *48857 -66355 
L 1 +39951 +34626 -27119 
M 1 *36146 -38718 *74467 
Pairs of brothers | 1 —+26399 — 26399 -82793 








The case of Table G, with negative weight for r,, and r,, is suggestive and needs 
further study. The table has the characteristic that the mean is in n,, and the 
marginal frequencies are decreasing in magnitude and nearly equal in both sets of 
categories. The table “Pairs of brothers” which is accompanied by similar weights 
is taken from Biometrika, vol. m1, 1904, p. 182, and is given below. It compares the 











athletic capacities of pairs of brothers. 


First Brother 




















a Athletic Betwixt | Non-athletic Total 
~ 
= 
we Athletic 906 20 140 1066 
nS Betwixt 20 76 9 105 
5 Non-athletic 140 9 370 519 
> 
R Total 1066 105 519 1680 
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11, = 8046 + -0126, 
ty_ = *7190 + -0162, 
Ty, = *7190 + -0162, 


Too = *8028 + -0132. 
Cy=1, Ci, = -- +26399, Cz, = — -26399, C,, = 82793. 
f, = 8382 + -0122. 
These negative weights require further investigation, particularly the conditions 
for the existence of zero weights, but it is clear that certain divisions are to be 
avoided in determining r from a 3 x3 table. 


On the whole C,,, C., Cy, C,, are in descending order of magnitude. 


§ 13. Pri&cts OF THE METHODS OF FINDING THE COEFFICIENT OF CORRELATION. 


1. +r,. Mean contingency, corrected for class index correlation. 

2. 14. Mean square contingency, corrected for class index correlation and 
where necessary for the number of cells. 

3. r,. By selecting the central cell, the method first described in this paper. 
As its use treats any table as virtually 3 x 3, it may be called enneachoric r. 

4. r,. By weighting the 7’s so that the p.z. shall be a minimum, the second 
method described in this paper. As it is applicable to tables of any size it may be 
called polychoric r. 

5. 115 712) 7215 %2g- +Tetrachoric r of the various quadrants. The probable errors 
were calculated by the complete formula (P.£.) and also by the approximate method 
(a.P.E.). (Lables for Statisticians and Biometricians, p. x1.) 

6. 1tm,. The unweighted mean of 7;, 712, 721, T2s- 

7. ,. The mean of the 7,,, etc., weighted by the reciprocals of the squares of 
their standard deviation. 

8. Mk, Megs My» hg Three row 7 calculated from each of the dividing planes as 
planes of reference with a class index correction on the foot of the columns. Since 
the standard deviation may be found in this case from the individual arrays or, 
assuming the distribution sufficiently homoscedastic, may be given the mean 
are distinguished by the headings “individual dispersion” and “mean dispersion” 
respectively *. 

9. 1,. By marginal centroids. 

The probable error of r,, and r,, was obtained as follows: 

Let the correlation coefficients 17,,, 712, --., 
have the s.p.’s Gaps Og, +> 
and the weights a een 


* The probable error of Biserial (or three row 7) has now been given (Biometrika, Vol. 1x, part tv), 
but too late for use in the present paper. 























A. Rrrconre-Scotr 


ss Lty,? (dry)? + 22tqyty247y, Ary. 
a (2ty,)? ; 
em Lt? oy)? + 22t13t12011012 Ry, * 12 
: pial i aaa neatobi 
When ¢,, = ty, = ... = 1 we have the mean r, r,,, and if there are /1r’s 
X0};* + 22041032 Ry1-12 


Or “ ia Mi het oe ee (148), 





Then 17,= = »  (dr,)? 
‘11 





Or 


which for convenient computation may be written 


> Sheu ai >: Sii-12 
Xi" X11 X12 
B ’ 
In finding the mean weighted r we may regard r,, as the mean of ¢,, uncorrelated 
2 
values of r of equal weight each having the s.D. a9. Hence o,, = oe and t,, = = 
11 1 
i.e. the weights are proportional to the reciprocals of the squares of the s.D.’s. 
Putting this value in (147) we have 








§ 14. DETAILS OF TABLES AND SUMMARY OF NUMERICAL RESULTS. 
I. The first table examined was taken from Pearson and Heron’s paper “On 
Theories of Association,” Biometrika, vol. 1x, p. 220, Table XIV, and is a Gaussian 
surface for r = -5 adjusted to give whole units in the cells. 














} | | 
| Bee ae 3 4 | 5+6 7 8 | Total 
} | | 
} | | 
Bil ae. 5 2 < = - | 34 
ee a 79 36 10 9 1 | 301 
a ae 85 54 19 22 4 | 284 
| at ee. 32 39 31 12 17 4 | 137 
| 6+6) — 18 28 25 ll 18 5 | 105 
a a ee ee 22 24 12 22 7 | 98 
| JO Pingel Woe Sis 8 5 13 7 | 41 
| Total | 36 322 | 264 180 69 101 28 | 1000 


The frequency in heavy type contains the mean of the surface. 
A. Table I divided so that the mean falls in cell m9. 








5+64+7+4+8 








The Correlation Coefficient of a Polychoric Table 


ry > +482 
r, = 4840 + -0170 A.P.E. 
r, = 50346 + -02094 \ 7,, = -498 + -02872 (-0290) 
r, = 48594 + -04918 Tie = °510 + -03210 = (-0303) 
1m = 5050 + -0246 | r., = -508 + -03505 (-0321) 
tT, = 5045 + -0211 Too = °504 + -03259 (-0340) 
r, = 5145 
Mean Individual 
dispersion dispersion 
Ns 5031 “4950 
ks “5057 4975 
™ “5045 -4955 
Nhe -5058 +4942 


I here insert as an illustration of the new method the constants required in 
finding 7, for the above table, and the calculation of S,,..;, Py, and II,,, and the 
equations to find the C’s. 





























Table A | hk, | hake hak, hake 
—_——_ — | —_——_—— 
r | -498 ‘510 -508 -504 
h | --36381 ~-36381 -84879 -84879 
rh | --18118 ~+18554 -43117 -42779 
Eh |  +8733945 ‘3733945 -2782707 | -2782707 
k | --42615 69349 ~-42615 | 69349 
rk | —+21222 -35368 ~-21648 “34952 
Ek | +8643145 ‘3136735 ‘3643145 ‘3136735 
h-rk ~+15159 ~-71749 1-06527 -49927 
h-rk 
Vi =, —-1748086 ~-8341218 1-236735 ‘5780570 
h-rk 
Via -3928931 -2817266 -1856865 -3375594 
h—-rk | 
B=E€7—, | 4306151 -2021063 ‘8919072 -7183872 
k-rh ~+24497 -87903 — -85732 -26570 
k-th 
Vi = --2824913 | —1-021920 - 9953132 -3076287 
k—rh | 
Py er ‘3833377 | ~=—--2366676—S | «— -2431048 ‘3805048 
k—rh A 
A=€ oe ‘3887835 | 8465906 1597920 -6208174 
x: | 165-0602 102-7353 78-53720 122-5923 
I — -1806014 0486969 0516992 -3392046 
P | 90-44050 | 128-8716 | 111-9422 383-0020 
Sy/x x:* | 001812696 | -000692554 |  -0006728030 -0001333663 
Si/x:x: | — -002265285 | -0003626002 -0007349840 
Slu:x: | — — -002700292 -0008662932 
Soo/x.X — — — (002335123 
C: 1 51333 “3859 -71892 
* The suffix : indicates that appropriate suffix is to be taken from the column, 





























II. 0486969 
II,, 0516992 
My, 193 


ae 
By 2021063 
Tl,, 0516992 
22 
My (122 1.074745 
By, —--2021063 
B,, 8919072 
D) 
Mm 20 3.605901 
Tl,  -0486969 
Ay, 1597920 
Me 134 1042704 


Calculation of P,,. 
Ay, 3887835 





m,. 358 139-1845 

2. -4306151 

m., 335 144-2560 
283-4405 

My, 193 

P= “90-4405 


Equations to find C, 


A. Rrrcoute-Scorr 


Calculation of S4.21- 


Bis -2021063 
Ay 1597920 
Ma 209 

Ax 8465906 
As 1597920 
M13 31 


Py, 128-8716 
P,, 111-9422 
+N 1000 


X12 (1027353 








6-749650 


4-193630 
17-351825 


14-426170 
2-925655 


X21 78-5370 + 8068-543 








Ay, 3887835 








B,  °4306151 
-8193986 

1 
Il, = -1806014 





Calculation of Ty. 


-001812696C,, + -000692554C,, + -000672803C',, + -000133363C'.. = A, 
-000692554C,, + -002265285C,, + -000362600C,, + -000734984C,. = A, 
-000672803C,, + -000362600C;,. + -002700292C,, + -0008662930'. = A, 
-000133366C,, + -000734984C,, + -000866293C,, + 0023351230. = A. 


The solution of these equations gives the first row of figures in the C table on 


page 123. 


B. Table I divided so that the mean falls in cell ,;. 








Total 


619 
137 
244 





14243 s | Seles 1 
| 
14243 462 92 65 
4 | 72 | 31 33 | 
5+64+7+8 87 57 100 | 
-— — - ~ -_ —_ —————E ————eee ———EEEE - -_ t 
Total 622 








1000 








0003626002 
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ry = 533 
rg = 510 + -0161 A.P.E. 
t, = ‘5007 + -0250 \ 7,, = -499 + -028 (-028) 
r, = ‘5073 + °1437 | 71. = -501 + -030 = (-030) 
Tm = °5010 + -0254 | 72, = 500 + -031 = (-032) 
Ty = *5008 + -0253 } ry. = -504 + -033 = (-034) 
T, == °5445 
Mean Individual 
dispersion dispersion 
Nk, -4921 -4873 
Nes 4917 -4763 
Nh, 4858 -4802 
mn, -4881 -4671 
C. Table I divided so that the mean falls in cell n,,. 
. i on -aiMEES 
| 14+2+3 44+5+6 7+8 
14+2+3 | 462 121 36 
| 4+5+6 119 79 44 
| 7+8 41 49 49 
= = = 
| Total | 622 249 129 
ry = 537 
rg = 5183 + -0202 A.P.E. 
Ty = *4988 + -0241 11, = °499 + -028 (-028) 
r, = 5055 + -0652 | 71, = -490 + -035 = (-035) 
m == *4985 + -0253 | To, = 505 + 035 (-035) 
tT» = 4973 + -0311 ) ro. = -500 + -039 = (-043) 
rT, = 5480 
Mean Individual 
dispersion dispersion 
nk, -4895 ‘5041 
Nk -4934 -4615 
Mh, -4846 -4730 
Nhe *4947 +4496 
D. Table I divided so that the mean falls in cell n,9. 
| 
1+2+3 4 5+6+7+8 
| 
| 1+2 277 38 20 





of a Polychorice Table 





| Total 


| 
| 
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ry = °482 + -0631 A.P.E. 

ry = 0018 + -0235 \ 7,, = 501 + -030 = (-030) 

r, = ‘5173 + +2330 | 7,4. = -499 + -028 (-029) 

Tm = 5030 + -0239 | r., = -508 + -035 (-032) 
( 





Ty = *5025 + -0236 } ry. = -504 + -031 (-032) 
r, = +5098 
Mean Individual 
dispersion dispersion 
| k, 5211 -4798 
| Ne 4885 -4907 
Mh, -4956 -4709 
"hp *5115 “4553 


E. Table I divided so that the frequency of ng. differs very little from the 
frequency of a table with the same marginal frequencies but of zero correlation. 
The mean is in cell ng. 


























| 14+2 3 44+54+6+7+8 Total 

1 _ | 5 2 34 

2 | 166 a 56 301 
34+44+5+46+7+8 165 180 320 665 
Total io aoe 264 378 1000 





301 x 264 
1000 © 
small quantity and any error of sampling will have an excessive weight. It will be 
found as one might expect that the p.z. of r, is very large. 


Here = 79-464 so that the constant term in the equation for r is a 





The very large value of m, is due to the column having the marginal total 378, 
for the frequency 2 in it is the nearest whole number to a true value and being 
so small, a small absolute difference makes a large fractional value resulting in 
a large difference between the true and apparent standard deviations of this par- 
ticular array. Actually the method applied is inapplicable.to a frequency of this 
order. 

ry = +502 
ry = °4827 + -0246 A.P.E. 
ry = 4991 + -0245 } ry, = 500 + 056 (-054) 
r, = 4658 + -4371 |. ry. = -498 + -029 (-029) 
Ym = °4995 + -0327 [{ r, = -500 + -072 = (-054) 
) 





Typ = 4995 + -0247 } rep = 500 + -030 (-029 
r, = 5065 

Mean Individual 

dispersion dispersion 
Nk, -4862 -4906 
Ns -5169 -4991 
| Mh, -4950 5022 
"hs -4966 ‘7150 
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II. The second table examined was taken from Macdonell’s paper “On Criminal 
Anthropometry,” Biometrika, vol. 1, p. 216. The original table is too extensive to be 
given here, but may be found in loc. cit. The hérizontal categories are the heights of 
3000 criminals in feet and inches, and the vertical categories the lengths of their 
left middle fingers in millimetres. The correlation coefficient found by the product 


moment method is -6608 + -0069. 


F. Table II divided so that the mean falls in cell np. 












































55,%;""-64 5” 64°,”"-669,”” 66,%,"-77” Total 
be Ei | 
9-4-11-3 mm. 682 270 101 | 1053 
11-4~-11-7 mm. 282 351 286 919 
| 11-8-13-5 mm. 90 299 639 1028 
| Total 1054 920 1026 | 3000 
ry = 6635 
ry = 6170 + -0075 A.P.E. 
r, = -6544 + -0101 ry = 667 4-013 (-014) 
r, = 6316 + -0301 | r,,=-670 4-014 (-013) 
rm = +6530 + -0101 [ rp, = 6444-015 (-014) 
ry = +6538 + -0101 ) 799 = “631 4-014 (-014) 
r, = 6911 
Mean Individual 
dispersion dispersion 
"ky 6477 “6295 
Nhs 6548 -6306 
Nh, -6647 ‘6151 
"he 6345 6510 
G. Table II divided so that the mean falls in cell n,,. 
| | 
| 55,%,"-65,9,"” | 65,%"-66,%,” | 66,%-77” Total | 
9-4-11-5 mm. | 1122 176 216 1514 | 
11-6-11-7 mm. | 191 96 171 458 
11-8-13-5 mm. | 203 186 639 1028 
Total | aie | «458 | 1028 | 3000 | 
Ty = 731 
r, = 6426 + -0077 A.P.E. 
tp = 6613 + -0108 ) ry, = “680 + 012 (-013) 
r, = -6808 + -0390 | r,, = -668 + -013 (-013) 
%m = “6553 + -0111 | fa = 642 + -014 (-014) 
» = 6573 + -0112 } rq = -631 + 014 (-014) 
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Mean Individual 

dispersion dispersion 
Nk, 6553 -6421 
"ks -6473 -6546 
hy -6657 -6669 
"he *6277 -7113 


H. Table II divided so that the mean falls in cell n,,. 






































55 ,”-65,%,”" | 65,%,-66,%," | 66,%,”-77” Total 
_ 
9-4-11-3 mm. 840 112 | 101 1053 
11-4-11-7 mm. 473 160 286 919 
11-8-13-5 mm. 203 186 639 1028 
Total an 1516 458 1026 3000 
ry = +669 
ry = ‘6172 + -0088 A.P.E. 
ry = 6479 + -0107 | r,, = -648 + -014 (-014) 
r, = 5162 + -0438 | r,, = -668 + -013 (-013) 
1m = °6478 + -0108 f To = 644 + 015 (-014) 
ry, = 6591 + -0107 ) rap = -631 + -014 = (-014) 
r, = °7920 
Mean Individual 
dispersion dispersion 
Mk, *6539 -6279 
Ny -6472 6156 
hy “6479 6103 
"he 6345 -6418 


III. The third table examined was taken from Pearson and Heron’s paper 
“On Theories of Association,” Biometrika, vol. 1x, p. 219, Table XIII, and is a 
normal surface having r=-3. The values of the frequencies to two places of 
decimals were used. 





































































































132 The Correlation Coefficient of a Polychorie Table 
K. Table III divided so that the mean falls in cell np. 
1+2 3+4 5+6+7+8 Total 
1+2 162-20 135-25 37-55 335 
3+4 142-42 195-10 83-48 421 
5+6+7+8 53-38 113-65 76-97 244 
Total 358 sad 198 1000 
ry = °3088 
ry = +2960 + -0199 A.P.E 
fy = *3000 + '-0246 \ 7,, = -300 + -033 (-034) 
r, = 3000 + -0991 | 7,, = -300 + -036 = (-035) 
Tm = °3000 + +0249 [ oy = *300 + -038 = (-037) 
Ty = 3000 + -U247 Too = *300 + -038 = (-039) 
re = 3025 
Mean Individual Mean Individual 
dispersion dispersion dispersion dispersion 
My “3011 -2988 "ms -2998 +2997 
Nhe -2999 -3003 Nhs +2989 -2989 
L. Table III divided so that the mean sii in cell n4;. 
CERES NN, | beh abides | 
| 14243 4+5+6 | 7+8 | Total 
| 4 | ry 
1+2+3 429-68 134-70 54-62 | 619 
44+5+6 | 132-38 69-81 | 39-81 | 242 
7+8 59-94 | 44-49 34:57 139 
seer es Me ioreets 
Total | 622 | 249 | 129 1000 
Ty = -330 
re = °3095 + -0225 A.P.E. 
ry = *3000 + -0282 1 = °300 + -032 = (-033) 
r, = °3000 + -1031 | ry. = -300 + -039 = (-040) 
1m = °3000 + -0326 | 72; = 300 + -040 = (-041) 
ry = 3000 + -0285 } ro. = -300 + -046 = (-050) 
r, = 3152 
Mean Individual 
dispersion dispersion 
me, 2965 -2975 
Ns -2910 -2898 
Mh, 2953 :2975 
"h, -2909 2871 


ny. 





The fourth table examined was from Pearson and Lee’s paper 
Distribution of Frequency (Variation and Correlation) of Barometric Heights at 
Divers Stations,” Phil. Trans. A, 1897, vol. 190, p. 453, Table IX. The original 


“On the 
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table is too extensive for reproduction and may be found in loc. cit. A condensed 
form of it wili be found in Biometrika, vol. 1x, 1913, p. 223, Table XVIII. 


This was selected as an example of a very skew distribution. The correlation 
coefficient found by Product Moments is -780 (Biometrika, vol. 1x, p. 223). 


M. Table IV divided so as to give a reasonably large frequency in the cell nyo. 
The mean falls in the cell n,,. 






































; 30:1’ andover ; 30’-29-8” 29-7” and under Total 
29-9” and over 1086-5 412 43 1541-5 
29-8’’-29-7” 144-5 275 103 §22-5 
29-6” and under 56-5 323 478-5 858-0 
Total | 1287-5 1010 624:5 2922 
Ty = -789 
rT, = °7504 + -0210 A.P.E. 
Y, = *7864 + -0077 \ 7,, = -780 + -010 = (-019) 
r, = °7745 + -0151 | ry. = -787 + -011 = (-011) 
Tm = *T877 + -0078 | 72, = -795 + -012 = (-011) 
Ty = °7858 + -0077 } ro = °785 + -011 = (-012) 
r, = 8770 
Mean Individual Mean Individual 
dispersicn dispersion dispersion dispersion 
nh, -7857 ‘7116 h, -7962 ‘7417 
Nk» -7812 -6951 Nhe -8065 -6841 
Appendix. 
Probable errors of 7m, ws T5- 
P.E. of | P.E. of P.E. of 
| arithmetic | weighted | polychoric r 
| mean (7f,,,) mean (7,,.) (7) 
A 0246 «8=| 0211 | -0209 
B 0254 | -0253 | 0250 
C 0253 | -0311 | -0241 
D -0239 | -0236 0235 
E ‘0327 -0247 -0245 
F 0101 | -0101 | ~~ -0101 
G 0111 -0112 | -0108 
H -0108 | 0107, |_~—s 0107 
K -0249 -0247 -0246 
L 0326 =| © -0285 0282 
M -0078 0077 | 0077 


My thanks are due tu Professor Pearson, who suggested the enquiry, for his 
ever ready help and advice throughout the work. I have also to thank Miss Alison 
Robertson for assistance in reading the proofs. 











ON A FORMULA FOR THE PRODUCT-MOMENT COEFFICIENT 
OF ANY ORDER OF A NORMAL FREQUENCY DISTRIBUTION 
IN ANY NUMBER OF VARIABLES. 


By L. ISSERLIS, D.Sc. 


1. In Biometrika, Vol. XI, Part III, I have shown that for a normal frequency 
distribution in four variables, if 


Payet = nene {Nayet xyz} /N 


denotes the product-moment coefficient of the distribution about the means of the 
four variables and q,»z; is the reduced moment, i.e. 

ovat = Poyat/FxFy%z Fr, 
then Wiss OE Vets F Cele ¥ Cael gh ican sceceveaeseseevsestinse (1). 

In this result any two or more variables may be made identical leading to a 
variety of results for moment coefficients of distributions containing fewer than 
four variables but of total order four, for example identifying ¢ with x we obtain 

igi Fae ie si vede recccesveseeuseosseeessoes (2), 
and putting y= z=t=~« we find gx = 3; of course qzy = fry and q,: is merely By. 

I suggested that (1) was probably capable of generalisation, and I now propose 
to prove a general theorem which gives immediately the value of the mixed moment 
coefficient of any order in each variable for a normal frequency’ distribution in any 
number of variables. 

2. Consider a normal distribution, total population N. Let Nj... denote the 
frequency of the group in which the characters differ by 7,, 72, ... %, from the mean 
values for the whole population and let 

Pil. gh. 2 as S (N13... 0%" LQ" eee Ly'™)/ ceccvscecesereosesees (3), 
denote the moment coefficient of the most general kind about the mean values of 
the characters. The corresponding reduced moment will be 


Qh gh B Pi... inl 1 Og”... 0," cee ceevecceccceccesceces (4). 

Then for normal distributions, 
SON, Wien UMN was wancedeskorsecenctaunseannbes ie (5), 
and if w be even, 9ag..n = SB (Teaeg +++ Tay) voccecescccsceeess (6), 


where the summation on the right-hand side extends to every possible selection of 
n/2 pairs ab, cd, ... hk, that can be formed out of the n suffixes 1, 2, 3, ... n; equa- 
tion (1) is thus a particular case of (6). 


Equation (6) is the theorem it is proposed to prove. The value of qyoh... nt 
is at once found for given numerical values of the indices 1,, l,, ... l, by writing 
down (5) for /,+1,+ ...+1, variables and identifying the values of 1, of them with 
that of the first and so on. 
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For example if we require the value of g,:9:5: we commence with, 
qissase = S (Tanea% es) 
= T12 (Taa%ee + 135746 + a¢%as) + 1s (T2476 + 125746 + 126745) 


+ 134 (72376 + 125736 + T2635) + 715 (7237s + 7241's + 7287s) 


We eaten S Manian Paalbad cassesvecsseccsosessonsncesennetonssesece (7). 
Identifying 4 with 1, 5 with 2 and 6 with 3 we find at once 
429232 => 1 ao 2749" + 2195” +- 213," + 8712703131 eoccccccccceccecee (8). 


3. We note first that g, which in the more usual notation for distributions in 
one variable is j2,/1,"" is known to have the value 1.3.5... (n — 1) when n is even. 
As regards S (rq,7,q--- Tnx); if all the variables are made identical, each term 
becomes unity and the number of terms is the same as the number of ways of break- 
ing up an even number (n) of objects into (n/2) pairs. This last number is clearly 

n! n— 2! 4! 
Sw B12! n — 41a ai/ (m2)! 
which also reduces to 1.3.5... (m — 1); thus equation (6) is correct for this par- 
ticular case. 

Secondly let us consider the value of gx-1,. The mean value of 2, for a given 

value of 2, is 73202%,/0,, let 


nt ee 
i, = ine Y,+ Xz. 
Then the distribution of X, for a given value of 2, is itself normal and its kth 
moment is zero for an odd k and 
1.3.5... (k — 1) (yo2)*? 


for an even k where ,o, is the standard deviation of 2 within the z, array so that 
192" = (1 — 742") o,”. 


ie ee — Mean value (z,"-1 24) 
2 





Lae Mean fo" Mean (r = t+ x,)} 
1 


0," G2 
Peat Se BW nN A BRM iocssscgiiernecoersotossavneomens (9). 
The method employed in the original proof of equation (1) is not convenient 
for generalisation and we will now prove the equation 
Qiosa = 12734 + 113794 + TraT 2s 


by. the method that leads to the general case. 


I 


° C 
Putting as above Le = Tig a 2, + Xp, 
1 


Os 
fs = 113 — % +. Xz, 
ba | 


4 - 
Ty= Ty — 2, +X, 
oO 
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we have 
Presa = Mean of (214232) 
= Mean of {x, (Mean of x, 7,2, for a given value of 2,)} 


= Mean of E {Mean of (r1 73 + X,) (ras 22 ay +'X,) (ru “ay. + x,)|. 
O7 O71 O; 
Now for normal distributions (and if the original distribution is normal, so is that 
within the x, array), Mean X,=0, Mean X,X,X,=0, while 
Mean X,X5 = (102) (105) 1723 


— G3 (723 — 112713) 
= ao 2 a ° 2 —-—-. a 
V1 Tie C2 V1 T13 V1 we 119" V1 cao Tis" 


= (Tos = 712713) G203 Cee e meee reese reser sere eee eee see eeeese (10). 








Hence 
3 


( wy zy 
P1234 = Mean of E [712713714 O20304 as + T1202 oa ("34 — 713714) C3% 
3 1 1 . 
vy 
+ 13393 4 (724 — T127 14) 92% 
1 


+ rus (723 — 712713) a, | , 
or dividing by o,020304, 
i294 = 1127137 14914 + Mx {712 (sa — 13714) + 113 (724 — M127 14)} + 114 (723 — 12713) 
= 112134 + Te3%14 + T1471 23; 


since gz = 1 and q,=3. Thus our formula is established for the case of four 
variables. 


4. We will establish the case for n variables by induction, and it will be con- 
venient to denote by ,9234..., the value of the reduced product-moment coefficient 
for the variables 2, 3, 4, ... n within the x, array so that 

Mean value of (X,X;... X,) 
17234 ...n = — 
(1:02) (103) --- (10%) 
where X,, X;, ... X,, denote as before the deviations of the variables from their 
means within the z, array. Of course when n is even, 





, 


19234...n 18 zero since n — 1 is now odd. 


Let n be even and assume that our formula has been proved true for all even 
values of n up to n — 2 inclusive, then 


Dies ...n = Mean (2,292 «.. Zp) 


x x x 
= Mean {ey (rie02 “5 + X,) (ris05 a + X,) ha (rine = + x,)| 


= T2113 --. Tin F203 -.. Cn Mean (x,")/o," 

+ 8 {(ryah wie ---) (Gg0p0, -..) Mean (X, Xg)} Mean (2,"-*)/o,"-3 

+S {(ryaT wie ---) (=aOpe, ---) Mean (X, Xp X,X5)} Mean (x,"-*)/0,"- 
= eee 
+ S {r,,0, Mean (X. Xp ... X,)}. Mean (2,3) /o, .........ccececcscececceces 
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the summations in each line extending to all possible permutations of the suffixes 
2, 3,4,.... The last line for example being 

; 2 
= bes Mea... 0) + et, ee 1... hs. 

+ 14,0, Mean (X,X, ... Xn_,)}. 

Now we have seen that Mean (X, X35) = (723 — 712733) 203- Similarly, 

Mean (X,X3X4X5) = (:02) (193) (104) (195) (192345) 
= (102) (103) (104) (105) [(27'28) (4745) + (2735) (x7 24) + (a?"25) (277s) 
= (Tog — Tia 1s) (a5 — M147 15) + (735 — 713715) (72a — 712714) 
+ (25 — T12%1s) (Ts4 — T1314), 
and our assumption of the truth of equation (6) up to (n — 2) variables will enable 
us to write down the mean value of every product of X’s occurring in (11). 
Dividing by o,0, ... ¢, we have, remembering that Mean z,"/o," is 1.3.5 ...(n—1) 
123 ...n = (Ti2113 eee Tin) 1.3.5 eee (n — 1) 
+ S {ryahw ic «+» (Tag — Tia tp)} 1.3.5... (n — 3) 
+S {rya% wie --- S’ [(Tap — Tate) (Tys — TryTw)}} 1.3.5... (n — 5) 
+. 
+S {18 [(rap — Tats) (Tys — Try718) (ep — TrieTip) «+» HL «sees (12), 
where S’ refers to permutations of afy ... only, and S to permutations of all the 
suffixes a, b, c, ... a, B, y ..., i.e. all the suffixes 2, 3, 4, ... n. 

It is clear that when the right-hand member of (12) is completely expanded 
no terms can survive which contain as a factor more than one correlation coefficient 
with suffix unity. This is easily verified in simple cases, and if in the general case 
a term f,.*%5.%a- --. Survived, this term would reduce to r,.* when we identified 
the characters a, 2, 3, ..., which contradicts the value 1.3.5 ...(m —1)17,, we 
have already found for it (equation (9)). 

The value of the right-hand member is therefore easily found by neglecting all 
terms containing more than one such factor. 

Hence on the assumption that (5) is true for all values of n up to (n — 2) we find 


“Y , 
9123 ...n = S {r..8 (Tap Tys ep aay |S 


but this is exactly the formula we wished to establish for it is obvious that 
S (rap%ca --- Tar) Where abe ... k is a permutation of 12 ... n is equivalent to 


S {riqgS’ (rapt ys ---)} 


where a, a, 8, y ... isa permutation of 2, 3, 4, ... n. Thus our formula which has been 
proved. true for 4 variables is seen by induction to be true in general. 


5. Formula (6) can be exhibited as a multiple definite integral: Let A denote 
the determinant whose kth row consists of the elements 


(Tixs Tors eee Tr-1, ky tee Ve+1, ko eee nk) 
and let A,; denote the cofactor of the element in the Ath row and kth column. 











ll 
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Ann Tn" Un Uy 
i oe Fe. ei Sl 
Let v= =(2 a+ 2 Ay, AFF), 
and z= aici. ——- e“h’*, 
(27)? 0,05... 0n-/A 
+o fn D 
then | | mS By Le ... By tOS, Ode ... Ug = B (Fey eq ++ Lag) 000000 (13), 


where a, b, c, d, ... u, v are the suffixes 1, 2, 3, ... n in any possible order. 
It is clear that (13) will enable us to write down the value of the multiple in- 


tegral | Pe-°dzx, ... dt, where P is any polynomial in z,, 22, ... z, on Q a positive 
Oe 7 


quadratic form. 
In fact, let La,,7,? + 22Xap,% 5%, (Apq = Igy) be a positive, definite, quadratic 
form, then 


W= [ 3 Pe [ I" Wy™ ... Ly exp — $ (La,,7,? + 2Da,,7,7,) dax,dz, ... dx, 








a) 
iar aoe Ty | “Wy Ly" os Lyn 
= n =e a a, an 
, -2m/ -D -2o 01 ' Og? ... Cy 
(27) 0, oy...0,VA ; 
1 ( Ly? 2 i 
exp — — | ZA,,—% + 22A 2s) dx, dx, ... dx 
p IA ‘ PD e," pa On, 1 2 n 


= Z[ravtca +++ Tar] Where abe... hk is any permutation of the a, +-a,+ ...+ a, 
suffixes of which a, are equal to 1, a, are equal to 2 and so on. 
Let D denote the determinant of the quadratic form and D,, the cofactor of 
My. the two multiple integrals will be identical if 
l= ofo¢ ... Con" Cgit’ «.. CF AD, 
oq = G5 08 ... CoG -.. 6,°AD,,. 
Hence 1,2 = [DygP/DppDoq and o,? = D,,/D while A = D®/D,,D,, ... Day, 
n 
so that W= fer 
D2"? 
where a, b, ...h,k is a permutation as above, and m = a,+ a+ ... + @, is even. 
W =0 when m is odd 
As an illustration of this result: 


=D Dea eee D,, POP ee eee eee eee eee (13’), 


[- a [ (Maty2z? + Natyz) 


exp — } (aa? + by? + cz® + fyz + 2gzx + BZhay) dxdydz 
2r)8!2 Qar)3/2 
— 27)" M (8FGH + 24F* + 2BG2 + 20H) + 27)" N (2GH + AF), 


A’ As/2 
where A, B, C, F, G, H are the cofactors of a, b, c, f, g, h in 
a ae 
A=|h, b, f |. 


 t 2 
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A cognate result is discussed by Mr Arthur Black in the Transactions of the 
Cambridge Philosophical Society*. Black’s integral is | Ve-Udz, ... da,, where V 


and U are any quadratic functions, the only restriction on U being that it should 
be essentially positive. Other particular cases have been dealt with in the paper 
previously quoted, and for the case of two variables several results are given by 


Mr H. E. Soperf. 
For reference we add a table of values of the reduced product-moment coefti- 
cients that occur frequently in formulae for probable errors and similar work. 
qu = 3. 
92 = 3132. 
Qiuzge = 1 + 27,,°. 
9x203 = 823 + 27273. 
qys = 15. 
use = 15142. 
Quiz = 3 + 127,,?. 
9x32 = 9712 + 67 ,% 
Yves = 3 (713 + 23712 + 2713712"). 
Gites = 3 (723 + 12732743). 
Qrze2gt = 1 + 2rgg? + 27g)? + 2ryQ* + Bry27o37s1- 
gus = 105, dye = L052, Qyege = 15 (67,2 + 1). 
que: = 15 (414, + 3742). 
quiet = 3 (8ryo* + 2474," + 3). 
Grro3 = 1.3... A — 1 (Tog + Ary2743)- A even. 
Meg = 1.3.5... A [(A— 1) 749? 13 + 713 + W127 o8]- A odd. 
For the case of two variables we add the following formula which is easily.proved 
by the methods employed in this paper. 


queer = (w+ 0) 1° + (5) (2) (w+. — 2) re-8(1 — at) 


+ (1) b (4) b(ut+v—4)re-4(1—9)?+ ...¢ 
the series terminating. Here 
ys (2m) = 1.3.5... (2m — 1) 
(1) _ o(v— 1)... (v— m+ 1) 


and 
” m m! 


* Vol. xvi, 1898, pp. 219—227. 
¢ Biometrika, vol. rx, p. 101. 


t This is virtually the formula (xxxii) employed by H. E. Soper, l.c.a. corrected for some misprints. 











ON THE MATHEMATICAL EXPECTATION OF THE 
MOMENTS OF FREQUENCY DISTRIBUTIONS. 


By PROFESSOR AL. A. TCHOUPROFF of Petrograd. 


INTRODUCTION 
I 


(1) One of my pupils, O. Anderson, in a brief exposition* of his researches on 
the Variate Difference Correlation Method in Biometrika (1914), draws attention 
to the superiority of the method of mathematical expectation over the methods 
usually employed by English statisticians. The small popularity enjoyed by the 
method of mathematical expectation in England is not of course accidental. 


English scientific tradition rejects the concept of “mathematical probability.” 


From the time of R. L. Ellis and of the first edition of John Stuart Mill’s 
System of Logic, the logician’s basis of probability has, in England, been the notion 
of empirical frequency. English mathematicians have followed the lead of the 
writers on logic in their preference for the idea of statistical frequency, and the 
method of mathematical expectation has naturally shared the fate of the concept 
of mathematical probability on which it rests. 


Notwithstanding its deep-rooted historical basis, English statisticians should 
break with this tradition. The substitution of statistical frequency for mathe- 
matical probability does not obviate the logical difficulties in laying the foundations 
for a statistical study of Causation, but merely shifts them elsewhere. The gain 
from the point of view of philosophi-al representation is sufficiently doubtful, 
while from the purely mathematical point of view the rejection of the ideas of 
mathematical probability and mathematical expectation is accompanied by very 
substantial disadvantages. Verbal formulation becomes very complicated, leading 
to loss of economy of attention: it is continually necessary to speak of “ the statis- 
tical frequencies which would become established if the number of occurrences 
were infinitely great.” The absence of a sharp distinction in terminology between 
statistical frequency in the exact meaning of the term and those quasi-empirical 

* Anderson’s research was carried out under my supervision in the statistical seminary attached to 
the Economics Department of the Petrograd Polytechnic Institute; the results he obtained were to 
have been published in extenso in the Proceedings (Students’ Section) of the Economics Department, 


but the War drew Mr Anderson away from his scientific pursuits to other work of a more practical 
character and the complete publication of his researches had to be postponed. 
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“frequencies which would become established in an indefinitely great number of 
occurrences” often fails to make the very statement of the problem clear to the 
reader, and occasionally it would appear, to the author: when reading published 
papers one not infrequently feels that the author does not give himself a full 
account as to what it is he is really calculating. 


Little harm follows so long as the problems dealt with are comparatively 
simple. But at the present time there are problems waiting for solution which 
are so complex that the slightest obscurity in their formulation threatens to 
become a source of error in the final deductions. 


When we start with “mathematical probability” and “mathematical expecta- 
tion” as a foundation we substantially simplify the mathematical exposition. The 
logical analysis of the conclusions to which we are led is not injuriously affected 
by the substitution of one set of terms for the other during the calculations. 


(2) If the variable magnitude X can take the values &,, &, ... & with proba- 
bilities p,, ps, .-. pe, 1 call the system of values &,, &,... & and the values 
Pr, Po, --» Pe associated with them “the law of distribution of the values of the 
variable X.” The law of distribution of values lies at the base of empirical 
“ frequency curves,” just as the mathematical probability of an event lies at the 
base of its statistically established frequency. 


Denoting by the symbol HX the mathematical expectation of the variable 
magnitude X, we have as is well known: 


EX = 


i 


i Me 


piki, 
1 
k 
where Y p=. 
i=1 
I call the variable magnitudes X, Y, Z,... mutually independent, if the law of 
distribution of each of them remains one and the same whatever values are given 
to the others. In this case HX remains constant for all possible values of the 
variables Y, Z, .... 
If the law of distribution of X does not remain the same for different values of 
Y, Z,..., the variables X, Y, Z,... are mutually dependent. The mathematical 
expectation of the variable X on the supposition that Y has received the value n;, 
Z the value §, etc., I denote by E(t & --)X and call it the “conditional mathe- 
matical expectation of X on the supposition that the remaining variables have 
received definite values.” 
It follows from the definitions that 
E(X+V+Z+...)\=EX+ EY + EZ+... 


both in the case when the variables are mutually independent, and when they are 
correlated, and that 


E (XYZ...) =(EX)(EY)(EZ)..., 


if the variables are mutually independent. 


VOL. 12—K 
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In the case in which X and Y are correlated we have: 


k 
EXY => p,§&E“Y, 
i=) 
k 
EY = p,E®Y. 
i=1 


II 


(1) In investigations in the theory of probability we frequently have to deal 
with expressions of the type: N(N-—1)(N-2)....N—k+1). Following the 
example of Capelli*, I use the slightly modified notation : 


ee a) 
N(N 4+1)(N+2)...(N+k-l)=yenf co " 
k 
Let N= > uy, N-* 
i=1 
Siank bib emerbeeosts Sxsvseee 


Nth ='S (-1)) BNE 
j=0 

The coefficients I have denoted by a, 8 are beginning to play an important 
part in the theory of finite differences+ and are of the first importance in all 
investigations into the law of large numbers. Their properties were first studied 
systematically in Chapter III of Cramp’s well-known work, Analyse des réfractions 
astronomiques et terrestres; some of their properties were discovered by investi- 
gators studying Bernoulli’s numbers; recently they have received the attention of 
the Italian mathematical school associated with Césaro and Capelli. The methods 
I employ to solve fundamental problems of mathematical statistics are directly 
founded on certain properties of the a, 8 coefficients. In view of the fact that I 
shall later on frequently make use of these methods, I state here, without proof, 
those properties of the coefficients a and #8 that I shall have to quote in the 
present paper}. 


(2) We have: 


%,=1 | 
MNS on Ske Pe Met Bede gdedeuccneds (wasnt sceevee ssa vets enesiseWecen as cdaevca torical sts (3), 
Oe, = VOR + On-1, 4-1 | 

he #—C2(i—-1)¥ +...4+(-—1) Ch (i—h)F +...4 (-1)- C;> a A‘ot 


“i= , 43... "103.5 °° 





* Vide Capelli: ‘‘ Instituzione di analisi” and the same author’s “L’ analisi algebrica e |’ inter- 
pretazione fattoriale delle potenza.’’ (Giornale di matematica di Battaglini, Vol. xxx1.) 

+ Cf. A. A. Markoff, Calculus of Finite Differences (2nd edition). 

} Readers interested in the proofs of these properties, many of them established for the first time by 
myself, will find a complete analysis in my paper in the Proceedings of the Petrograd Polytechnic 
Institute. 
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k(k—1)...(e-h+1)_ 


j = h 
euiing 1.2.3...h Ci, 


ie. denoting by C,* the number of combinations of k elements h at a time, we may 


express a, 4, in the form : 
n—1 ; 
Ok k—n = > Ay iGye"— PPPTTTITITTT Teri T Tei (5), 
i=0 





where the coefficients A,,; are independent of k and are defined by the relations: 








Ano=(2n—1) Ani 
Anj =(n—Jj) Anja + (2n—j -1) Pe kre (6). 
Aga An-s,0-2= 1 
Hence 
Ano=1.8.5...(2n—1) 
Ani =1.3.5...(2n—1)}[n—-1] 
Aneo=1.3.5...(2n—3) {h[n — 1] +43 [n-1]}™} 
Ans= 1.38.5... (2n—38) {a [n — 2J-9 4+ Ag [n—- 2} + A [n-2]?} | AT) 
An g=1.3.5...(2n—5) {hy [n — 2] + yy [n — 2] + YS, [n — 2] ‘f 
+ x [n — 2}} 
Ans = 1.38.5... (2n—5) {rpyq [n — 3] + ghey [n -— 3] + gy [n- 3] 
+ zb5[n —3]} + shy [n — 87 
The coefficient A,,,-; can easily be expressed in an independent form. 
Putting 1.2.3...¢=18], 
we shall have: 
‘ite ee aa i.e I (8), 


where the summation extends to all possible positive integer values of %,, i,, ... ty, 
satisfying the relation: 1,+%,+...+%,=4%, and to all integer values of hy, he, ... hy, 
satisfying the conditions 
2eh<h<...<hy, 
hyt, + hetg +... thy =n +i. 


Introducing the notation 


ae 
Nol @=V0F@-j)= % (-V ae | (9), 
and noting that 
tie a 
Vase = CO 





we find from (5): 
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When A>0 and 2n—h <k we have: 


(3) Of the properties of the 8 coefficients it is essential to note the following : 


where the coefficients B;, ; are independent of k and are determined by the relations 


"a a, ye as 0 

_ a alates. gener Fe nee 
Nae 
Ve» =4,, 21.8.5 

Veo%.s- ee 

7 a, k-n = 


B,o = 1 


Bij = Bea,j + (& —1) Ber, j-1 


Bes-1=1.2.3...(k—-1) 
i cca? : 
Be, j = 2 B; .Cp- 


B,. =1.38.5...(2j—1) 
B; 3 = (2j)-t— 1) (By + 
B; j= 1 Se Pee, | 


Hence we have: 


Bio =1.3.! 
By, =1.8. 
By2=1.38. 


Bi3=1 


B5=1.3. 


Or 


.Qj-1) 
(3-1) 3-1) 

.. (2) — 3) (4 Lj — 1] 4 [Lj -1]-} 
+ (25 — 8) (Lj — 218 + § G29 + gL — 2IH 
+» (29 — 5) infs [ — 23-9 + $ (9 — 23-9 + 95 Lj — 2}-4 


.-j — 5) {ss (i 31 + A Li - 3 +3 
+HL-3I +9 - 


From (12) we find, when A >0, 








js! 
pee yaj -t-f 
Vi, By, 3 B,C 
vt +h 
Vw B, ;= 
v2- h S yt 
Ae Ss Re 


Vey i=0 (or 27 -k) 


vi B,, =B,=1.3.5... 


J» 


i B,. j = B. 


J, 2j-k 


k+h i 
Vie) By; led 


k-2nt+h 


..(2n—1)/ 


j:t ~ k-2jt+h 


(2j - 1) 





Bj-,i] ne bgp deeneewanzonts (18). 


| 
| 
| 


(14). 
+$)- 20-9 | 
‘j- 3" 
3]! ) 
\ 
NDE Roe (15). 
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(4) Further, it is important to note certain relations connecting the coeffi- 
cients a and f: 


n—k 
= (- 1y On, k+i Busi. MEW ccs deskecosvacccteesccates (16), 
i=0 
k+m-1 ; 
An, n—k Buz, = = C,2#+2"-4 R,, Mi  eeeecccccccccscces (17), 


where the coefficients Rx, m,; are independent of n and are determined by the 
relations: 


Ro,m,i= Brn, i : 
ary 
Ry, n,o = (2k + 2m — 1) [Rea mo + Re, mol ...(18). 
Ry, mi = (2k + 2m — 1 — t) [Remi + Re, mai] 
+ (ke + 2m — t) [Ream + Re, ma,i-1] — Re,ma,in 
From (18) we find 
Ruso<1.8.5... (28-1) 05 


Ry t= ie 1.3.5... (2s —1) {CF + C*,_,} 
. J 
and, in general, Ro ona & Tap J Oreng cvcceseccrccveccecseccosess (20), 
=0 


where the coefficients 7, ,,; are independent of k and are determined by the 
relations : 


Ts,h,o = A,r 


h 
> 14,5 = Bar . (21). 
j=0 
Tei, (2s —h-1) Taki (s +) = h) Pee SE fey (s —)) Ms—i,h-1,j-1 
Putting 
l : : 
a w—j yl—j—1 : ) 
ae™ om {(C3,-0-; + Cn -1-j-11 Vs, 2h,j) - 
Sloe Lae Lo OM Se * en) ee ce ’ 
bey 
, ae mes vem, 
Cnt Cn-1-3 Ys, 2h+1,j 
j=0 
we find further: 
h 
3 ae ys-k-l 
RK, s—-k,2h tnd t, h,l C,on4 
scebtecesctusse (23), 
h : 4 
i ae s-k-l Py ys—k-l-1t 
R, s—k,2h+1 lot t 8, h,l [C,-on +2 ° Ci_ast-11 


Biometrika xu 
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where 
teo=1.3.5...(28—1) . 
ts:,0=1.3.5 ... (2s —3) {5 [s— 1]—!4+4[s— 1]-} 
te1=1.38.5... (28 — 3) {4[s- 1] 4+ [s— 1]}} 
ts,0,0= 1.3.5 ~..(26—5) {ghg[s—2] “4 [8 — 2-14 fy [s—2] 4 gy [s—2])} 
te 1: =1.3.5...(28s—5) {a45 [s—2]-1+ Fy [s—2]-) + $4 [s—2] 14-4 [s—2]-)} 
te o=1.3.5...(28—5) [g85[s—2]-144 [s— 2] 4-48 [s—2]-14+ 4 [s - 2]-)} 
t.00=1.3.5...(28—1)4[s— 1] \ 
t's1,0=1.38,5...(2s- 3) {3 [s — 2]-44+ 4 [s — 2] + 4 [s —2]-} 
1.3.5... (2s —3) {4 [s — 2]-9+4[s — 2}! + 3 [s — 2}-} 
t's.20=1.3.5...(28 — 5) {xatgg[s — 3]! + ah, [s — 3] + 745 [8 — 3] 
+ che [s — 3] + aby [8 -— 8] } 7 .-.(25). 
t’s,:= 1.3.5... (28 —5) {aArs [s — 3] + oh [s — 3] + J [s -— 3]. 
+ $$ [s — 33-4 + A [s— 3} 
t's02=1.8.5... (2s — 5) {iq [s — BJ) +. A, [c — 3] + $4 [s — 3] 
+48 [e— 3] + $[s- 3) 
From (17) we find, when h > 0, 


k+m-1 \ 


> (24), 











VisSa 0-2 Pista = =. ico 
ee n,n—-k B, 2079 
= h-i 
— ‘ et i Bis as Fim n-2k—2m+h | ...(26). 
Vier ene Ba-ne Rego H1-3.5... (2+ 2m—1) CF, 
Ves a, n—k Precis m or R, m,n-2k—-2Qm 
Veo a, n—k es =0 





(5) In Chapter IV we shall have to deal with more complicated expressions 

of type: 
a r, 

tr) Voy AV Geers Ars, ra—he +++ Up, rp—hz Brytret.. +1rp—hy—hg—...—hy, f—hy—he—...—hy* 

In my previously quoted paper they are not considered as I met them for the 
first time in connectiog with the problems considered in the fourth Chapter of the 
present paper. My discussion of these expressions has not so far led me to. results 
which may be considered final, and I shall merely indicate the method by which 
their fundamental properties may be established. 

Putting + 7r.t+...+7%=R, h,+h.+...+he=H, let us replace a and 8 by 
their values from (5) and (12). Noting that, as is well known, 

[a ty] = E Oygi ali yi-on—an, 
i= 


t= 
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we find: 
" T "; 
veo ie ita Ve) a,,, y—h, Gy,, Tq—he tee Or, rp—hy Br_n, S-H 


hs g-1 hy =1f-H-1 Aj,,t, Ang.tg «++ Any, te By-a, j 


4=04=0  h=0 j=0 (2h,—1,)!(2h.—l,)!... (2hy—L) ! (2f— 2H—j)! 


r; T; ", 
xVo,V (ry Verd 





rf @h—) fh by)) , 


ry heh] [r, 3 h, +1, — h, +... tT hy] 24-51 \ 








a i? tg Any Atyt --- Any ty Bru, j EE Stee aw 
t j=0 (2h; —1,)!(2h,—4)!... (2hy— be)! 9 gi! go! --- gua! (2f— 2H —j —g)! 
x oe gi-Gh— bn [r; oS Ayo . on [rz toa de bee 
ve. "eile (re my hy.) or Von) aie (rx is hy 22-5 ) 
sche (27), 


hy=1 hg-1—y- 
> 


, 


where S denotes * 
1 4=0%,=0 i=0 

S denotes =a a er bite fa = Banas 

g 97,=0 92=0 9k-,=0 

and J=N+tGot +--+ Je-- 

If we note that 7S. rf er-Wl fr, — h,j-9) = 0, when r, > 2h, —1, + g,, ete., we 
see without difficulty that, when 2/< R, the sum we are discussing is equal to 
zero. If 2f= R, then the only non-vanishing term in the sum is the one corre- 
sponding to l,=1,=...=]=j =0 and g,=7,—2h,, g2=1'2— 2hy, «.-; Je-r= Te-1— Zhy_, 
2f— 2H —g=r;,— 2h,, and the sum reduces to 

0,,™ 0,,™ eee C,,2¢ Aj,,0 Aj,,0 eee Ano Br o cccccevceccces (28). 
2 


If 2f=R-+1, there are three types of non-vanishing terms: (1) terms for 
which 1, =1,=...=],=j=0, and for which, of the quantities 9, Ses sss Shaws 
2f — 2H —g, one, e.g. g:, is equal to r;—2h;+1, and each of the rest is equal to 
r—2h; (2) terms, for which 1,=1,=...=],=0, j=1 and all the quantities 
gi =i — 2h;; (3) terms, for which 7=0, one of the quantities /, e.g. l;=1, and 
the other quantities / vanish, g;=7r;—2h;+1, and the remaining quantities 
g are each equal to r — 2h. 

Noting that VX X —hyt-4 
becomes r!h(r—2h+ 1) when k=7 — 2h+1 and X =r, we can without difficulty 
reduce the sum we are considering, for the case 2f= R+1 to: 

Cr Cn oe Ot Ayo Ate +» Anjo Bast 


y2h,; fy2h, 2hy 
+ C.. C. coe i An,o Ah,,o zie A hyo Bes, P 
°) , 


v2h,-1 sy2h, Rhy 
+ C,. r. eee C.. An, Ah,,0 eee An,,0 Bru, > » (29). 
2 ‘ | 


y2h, py2h,-1 y2hy ' 
+0 oon Oe Aj,,0 Aj, --- Anjo Bass H Te 
2 > +44) 





"k-1 (Tk 


12h, fy2h, 2he-y fr2he-1 
+? C.. re 3; C Ayo Abg,o +++ Ano Ang Bazi, 0 


10—2 
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CHAPTER I 


I 


Consider a variable magnitude X, admitting the values £, &,...& with 
probabilities p,. po,... py. Tet us make NW experiments, and suppose that the law 
of distribution of the values of the variable remains unaltered, and that the 
separate experiments are independent of one another. Denoting by X; the value 
taken by the variable in the ith experiment and by n,; the number of times, out of 
the NV experiments, that the variable X takes the value &, let 


, ao nj 
id | 
. ‘ 
Wy = HhX'= “f= a pie 
j=l 


Cd 


by =H(X—my=E(X;-my= > Pi (&; — m,)” 


F | 


Xm -L ES xald ngn $ py 
yn =z a (=> 2 NES = joj 
m N ja 7 N j=1 585 = P5 g 


my, (wy = EX yy 


My, in) = E [X wy) = m,|" 





We have, whatever be the law of distribution of the variable X : 





j=1 j=l 
k 
> n= N, 
j=1 
My, (N) = 1, 


aim =h= 0, 
KoiW) = ho = . 


We find further, without difficulty : 


by =m, — mM? \ 

by = m,— 3m,m, + 2m,? 

by = Mm, — 4m,10, + Om.m? — Bm, 

My =m, — C mm, +... +(— 1 6,4 m,_, m,* t+ ... r++-(2). 


+(— 1)? 0,7 mom," + (- 1)" (r — 1) m," 


By, (N) = Mt,, is o, Mey—4, LN) My Tce te = 1 y Ce My_h, (Y) my" +... 
+ (— 1)? C7-? me, (yy) my’? + (— 1) (r — 1) m 
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Conversely, expressing the quantities m, in terms of u,, we find: 


rm 
m, = E {(X —m,) + m,}" =m," +2 Chm? wn + pr 


r-1 
My, (y) =m," + 2 Ch my wn, wx) + Mr, iw) Geto ereere i) 
=2 


r r 
2, (- 1y Cf My—h Mh, (N) == (- 1y GC? Br—h Fh, (N) 
= =0 





II 
(1) Noting that 


N 


L128 2, 


Xr = | 


4 
é j 
, 


N Tr 
we write [= x,] in the form 
i=1 
r! é 

+ > es | rare Xi, Xi,” ... Xi, 

jit, i; Ty): Te: ..- Tj: 
where, as is known, the summation with regard to j extends from 1 to the smaller 
of the two integers r and N; the summation with regard to %, %, ... 4 extends to 
all integral and unequal‘values of %,, i,, ... i; from 1 to N, and the last summation 
extends to all positive integer or zero values of 7, 72, ... 7; satisfying the relation 
Tr+Te+...+7%=7. 


Passing to mathematical expectations we find hence 


ne r! : x 
iar. » 5 883... oe A Be 
rr, (Y) N’ oat oe : Y,! 75! ... 7! 1 2 ¥] 
Fit, MET 
L(t) r! 
2 By ry My Mg ++ Megs 


TG nin 


where the summation with regard to j extends to all positive integer values from 
1 to the smaller of the two numbers 7 and N, while the second sum extends to 
all positive integer values of 7,, 7», ... 7;, satisfying the relation: 


r++... $=". 
If r < N, we have consequently : 


le ea : 

My, (iN) = Arr > Ry Se) A ee aS AR See (4), 
N j =] ‘ 

where the R,.; coefficients are independent of V and are defined by the relation: 

1 ‘ r! 
— N Pees 
ri=1.93 _~ at al col My, Myy vee Mery 
«met ee. . yreee J . 


and the summation extends over the ranges specified above. 
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Hence we find: 


ri- O40] : 1 | 


R,,--. = > —— m= | ma, ™, -- May 
7 i a! | EY Re ae ' 





where the summation with regard to 7 extends to all integral values from 1 to the 
smaller of the two numbers h and r—h, and the second summation extends to all 
positive integer values not less than 2 of h,, he, ... hi, satisfying the condition : 


I, thet..e th =h+i, 


Mp, Myst ... My, jf 


= Spt i) yy r-h-i ae. Nin te, ae 
~ Rr,ra= %s mi > ET yl Ta Tl Diy te 


where the summation for 7 extends to all integer values from 1 to the smaller of 
the numbers h and r—h,and the second summation extends to all positive integer 
values of j,. jo, ... jy, Satisfying the equation : 


Atjat + $j = 4, 
and to all positive integer values of h,, ho, ... hy, satisfying the conditions: 
2<h, <ha<...<hy, 
hij theje +... thyjp=h +7. 
We find hence: 


Ry =me ) 
rr-1 =U am lo 
R,, --1 = 0,2 m,** m, 
| 
R,-2=1.3 Cm" m2 + C3 m,— m, | (6) 
\ ...(6), 
R,,7-3= 1.3.5 C,$ m—* m3 + 10 C,5 m,’— mym, + C,4m,"-* m, | 
R, 4 = 1.3.5.7C03 m,-* mt + 105 C,? m,"~ mm, | 
+ 0,8 m,"— (15mm, + 10m,7] + C5 m"-> m; } 
and on the other hand 
R,,, = m, 
ME ns Pea teesbncess vee reeset eciesies (7 J 
R,.= 3 > CO, m, Mert] 
+ #h=1 


and so on. 


(2) The calculation of the coefficients R,,,-, may also be effected by another 


method. 
: N r+ N r 
Noting that EK | = X,| =NE {x, E X,| ; 
i= j=1 


t=1 & 
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we find 
N r N h 
Mess, H) = yp B 1X | 2 x,| =5 we {xi = OP Xy- [= x, 
1 r-1 
= Nr {rea + 2, C,p Mrti—h (N ms 1/ Mp, (N-1). + (VY -1)"m, M,, wa} 


sag i {Meas +2 C, (N — 1) myr4s—n ma, wr-uf 
Substituting here (see (4)) 
(= 19) muy = & (N- 11 Bay, 
we find: wie 


1 r h 
My+1, (N) = A {ime +, = 2 0} baba” = [NV — 1} Ry, 


=H inter T by Nt- G00) 2 C, , Mr+i-h Ri, ne 


On the other hand 
2 Na Ry is 


Mr+i, (N) = im 


Hence 
Ryis,1 = My41 
a 
Ress = > C,) Mrzi—h Rin eee eee eeeerereseees 
h=i-1 
Ross, rti =m, R,, =m "* 


(3) . Putting (see Introduction (2)) 
= 7 
Nea = 3 (-1Y By WHY, 

f=0 

we find from (4) 
ete, split a 
My, (N) = Nr 2 = (- 1y Ni B 7 Ry; 
j=1f=0 


r r ‘ 
= N’ RA ial = 1)-* Bij,j-n R,,; 


= TS ' 


= m,” + p Ni 2 (— 1y Brith, h ee ek 
i=1 h=0 








=m" + V C2 my po, + a! 


When r = 2, 3, 4 we find: 


{1.305 my pe + C2 m," ps} + «.. 


1 
Ms, (N) = M,? + v He 
& 


3 1 
Ms, (nN) = m,° + WV M, Pe + NE bs 





1 ‘ 
My, (y) = m4 +¥ mM," be + a [4m, ps + 3p) ‘)+ 7 5 [Hs — By") 


> ...(8). 


...(10). 


(11). 
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III 
(1) Ifm,=0, then 
py = B(X — my = EX* =m, 
and br, (N) = My, (y)- 


Putting m, = 0 in the formulae of § II, we may, consequently, replace in them 
the quantities m by the corresponding quantities ». But when m=0, R,.,,=0, 
ifh<r/2; but ifh>r/2, then R,,,_, reduces (cf. (5)) to 





F. agetl hs __ eng! Mig «++ af! . 
NET OLGA elt cae ee 


where the summation extends to all positive integer values of j;, js, ... jy, satis- 
fying the condition j,+j.+...+jry=1—h, and to all positive integer values of 
h,, hg, ... hy, satisfying the conditions : 


2oh<hn<...<hy, 


hiji + hojot ... thyjy =r. 


Hence 
1 3r-1 ive Ef r-1 
— _ —(P— ’ 
Mer,(N) = Hyxr > Ni-er "T x, 2r—k = 7p > Nie) 
ie N* yW0 (13) 
1S Neon 1 *S' yew 7 7 
a | SS ] —(2r+1—h)] ve 2 eee {[-—(r—h)] 
Mer+1,(N) = \ror 4 er+1,2r+1—h = jFop £ ar+1,7—h 
N 2r+1 Roeti ‘ Nv h=0 
or 
Ent. (5)-1 
1 2 {-[gnt (5) ] 
T . -hA n 
, =— 2 
Pr, (¥) Ne , N I L’,. Bat. (t)-0 aetveerepaiuanes ..(14), 
- (Tv 4 rete 
where Ent. denotes the greatest integer in ~. 
\2 “6 7 2 
If we now put 
. y-h-1 : ; 
Ni-et—))— > (—1) Brn: Nr-h-i, 
i=) 
we find after some transformations: 
r-l1 yf ) 
a. Y 
rN) = si = (—1) By—isn,a Ter, risk 
i=0 h=0 
5 aa ee a Ae (15), 
4 az 
Mort, (N) = & Netti = (-1)*/ r—it+h,h T'r41,r—-i+h 
i=0 + h=0 
: . 
or Ent. (5)-2 1 


ee ey yee Byaeacal” Be. (5)-c00.n Peat. (f)-s4n 08) 


+ih=9 
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When + is even, 
Tre = 1.3.5... (2r — 1) ps, 
hi ale oo 
Perr = (2r)! . «i —1)! +R a uF oe »» [hy !]r (17), 
re Mn? ... Paf! 
Jrlje! ««+ Je! Thi !P (he! ... [hy !Pr 
where the summation for 7 extends to all integer values from I to the smaller of 


the numbers 2h and r—h, and the last summation to all positive integer values 
of j,, je, --- jy, Satisfying the condition : 











=1.3.5...(2r—1) py pl-Ati)] Qhti py r—h-i > 


Atjot tj =t, 
and to all positive integer values of h,, ho, ... hy, satisfying the conditions: 
B<hli<la<...<hy, 
hij + Rojo «2. thy jy= 2h + 2. 
When r is odd, 
Torsip =1.8.5...(20 + 15m Ms 


: r—-h-i Mad walt... pads 
Cnastaes a A sc 2 
arsr-h = (2r +1) ¢ 2*i(r —h =i)! ~ jy! jal. Jp! la) [hol ... [hy  f (18), 
Bag Mr, tee ny! 

Jrije! «+ Jp! [Raf [het ps... Lay! 
where the summation for ¢ extends to all integer values from 1 to the smaller of 


the numbers 2h +1 and »—h, and the last summation extends to all positive 
integer values of j,, jo, ... jy, satisfying the condition : 











= 1.8.5... (2r-4 1) E rio Qhti yrh-i 5 


jtjot oe tj =t, 
and to all integer values of h,, hy, ... hy, satisfying the conditions: 
3<h<h<... <hy, 


hy ji + haje +... + hy jp = 2h + 2+ 1. 


The coefficients 7’ may also be calculated by means of recurrence formulae. 
Putting m, = 0 in (8) and replacing the m’s by y’s, we find: 


1 r—l s 
Pr+i,(N) = Nr |r i = OF Mr+i-h (N = 1)/ Ph, w-} > 


and hence (or directly from (9)) 
T41,1 = Mr+i 


r-1 
T41,% = = OS Br+i—h Thi 
h=i-1 
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We find in this way : 
Tre =1.8.5...(2r—1) py" 
Toppa = 1.3.5... (2r— 1) (hr) pyr py + r-9) wl pe} 
Tenet 00.8 
Fg pal pps + gig pal ps? iy 
Terie =1.3.5...(2r+1) drwy py 
Pri312%1.3.5... 


T +1, re=l. Se. : 


T—6 


+ Ar-4 w-* y,3 Bat ayy 7 pe pos 5} 


Hence: 


pet 86. (r= 1) 5 He 


1 
Fp FB pal? pa + OY pal? pos? — $20 py] 


(27 +1) {yy pal? ps + pg rl) pa? urg pr, 
+2741 pr ps} 

» (Zr +1) {gh rl) pel? py + hy Th pel pos fs 

Fro Mal pais + hg) pal as? bs + hy 1) wal pane 


(20 — 1) (aby 789 pla poy + yg 8 posh a? 
+ ght wy ps} 


) 





/ 


1 . 
+ rn [vor Ba * ba + Ber pal po bs + pg P— pg pe 


r—l 
7 Ti id ps? bam TZ ri-3) pr Ma ao ahe yl-] 1 ae bs! 


fa den2 — 1) (r — 3) rl-3) peo ps 





— rls] | 


1 
Mer+1,(N) = 1.3.5 ...(r+ 1) Sie Ar py" ws 
ir 
+ re 





yr! pr? Ms+ dari pat? s/s 


1 
+ re 


+ hg 8 pl ps? ps — 





r—5 





(r— =e tras 


+ ar py I-86 u3 w,— 


_ (=I) (r-2)(r- r(8r—1) 4 


79 one _— 





4 
) i pl pd + 
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On the other hand, 


P'xr,1 = for 
a eas =(2r)! lat [r TP 


Ph bh Mor—h 
+z hi(2r—h)! 


Tr41,1 = Por+i 


r—-1 
j=’ 2 c’ Mh Har 


a 
h 
Tors, = 2. Cains Ph Por+i-h 


) 
nba 


+ ar [TA a ps? ae - yl—2] pt p.| 
de TAOS pag ® py gy TY ple pg oe + hy TE pag pas 
Peg” —® ps phy? 


ri] pe? Mah, ++ sho yl) ag* pe? 


Gem 


1 -e 


\. 
‘ 
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2 
a 


\ 








.-.(20), 


...(21). 


(22), 


(28). 


CORO eee eee emer eee aeere 
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For r = 2, 3, 4, 5, 6, 7, 8 we find hence: 
1 
He, (W) = 7 Be 
1 
Hs, (W) = 773 Ms 
1 1 yt ee Pe, 1 : 
M4, (N) = 775 (Na + 3 p,3} = pate + 77a [oe — Bp") 
1 . 10 1 
| Ms, (N) = 775 {Nuts + LON psy} = JVs Malte + 7; [os — 105 p10] 


1 
He, (W)= 76 {Noe + NI [1 5p, pre + 10,2] + 15 p,3) 


15 . 1 : 
~ We Be + N: [3 p44pe + 2u,?— Dyn8) + Vales — 15 pg pe — 105? + 30°] 





1 = 
#1.) = 7 {Wy + NON pis fe + 35444445] + 105.NI- py ,%} \ ...(26). 





105 7 ’ : 
= “yt Ms Me? + Ws [Bpls po + Oss — 6343 42°] 
1 
+ pe le — 20 ps oe — BE papa + 210 pu: 


1 . a 
He,(N) = 77a (Noe + NI! [28 pepe + 56 us p5 + 352] 


+ NEV[210 pp? + 280157 9] + 105N'-4 y,$} 
105 70 
bs Wi Mat + 75 [Bpada® + Apts? He — Dyin] 





Ul Sale! _ 
+ 77a [4Me be + Soros + 5p — 90 peyue? — 120p42 py + 165p;4] 





1 és " s 
i Wr [Ms — 28 5 fo — 56 us Ms— B5me+ 4204p? + 5603? 2 — 630y,') 


(2) Noting that 
1 N 1 h N 
Xm =F = (Xi—m) =F 43 (Xi—m)+ FE (X;—m,) 
Ngan N lin i=h+1 


1 
= V \h [X a = m, | + (NV — h) [Xw-a) - m,]}, 
we find 
= eee : : : 
[X (N) — m, |" = iG > C, hr-t (N _ hy [X w -—- m,|"-* [Xww_ay — m]’, 


1 : : 
Br(N) ™ F75 bad C, hr-* (N — hy Pr—i, th) Mi,(N-h)> 


ae , ' 
N° prc) = AP py, yy + = CPW prs, ay (NW — hy)! ivy + (NW — hy br,(N—h)- 
i= 
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Putting h=1, we obtain 


r—-2 : 
NY" b,c) = ber + = Of (N — 1)! pri poi, ew—ry + (WN — 1) tr, v=): 
i= 
Hence : 


r—-2 
N? prey — (N= VY br, (ay = Be + = C.F (N — 1)! pri oie); 


r-2 N-1 : 
and N° u,.y) = Nuy + 2 CF pra Rad (N —j)' Pi,(N-j)  cecececeeees (27). 
j= j= 
If we give r in turns the values 2, 3. 4, ..., we find the relations (26). 
IV 


(1) From the relations (15) and (17) we find: 














-1] # Ts 
Po®) = 1.3.5...(2r—1 {i +t 2 ii mithh ae . 
wa) OE eel eT OAS, 
As N increases the ratio pe psi consequently tends to the limit 1.3.5... (27 —1), if 
2 ,(¥) 
rl y ¢ - ; 
am 2 twit? : ae. 
ba Vi ,~, 1) Brith 35... (Or —1) pi 


tends to zero. 
But if this last expression tends to a limit different from zero as N increases, 


then the limit to which eo tends cannot be equal to 1.3.5... (2r — 1). 


1 - — _— 
...(2r—1) 
any value of r and is independent both of the value of NV and of the law of dis- 


The quantity a5 B,~i+n,n does not become infinitely great for 


tribution of values of X. In order that ea should tend to 1.3.5... (2r—1) 


Ll in 

Wear T'w,r-i+n Should tend 
to zero for i=1, 2, 3,...r—1. A sufficient condition for this, in its turn, is that 
expressions of type 


it appears then to be a sufficient condition that 


Mn Mn? eo Mnf! 
Nt por eH 


should tend to zero, when the quantities ),, j., ...jy, are connected by the relation: 
jitjet---+jp=!, and the quantities h,, /.,... hy satisfy the relation : 


ly js + hojat ».. + hyjp=2(i —h + D), 





and / can take all integral values between 1 and 2(i—h). Finally this condition 
is satisfied, if expressions of type 


Nui 
[Nyp,}"* 
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tend to zero, as NW increases, when 7 = 3, 4, 5,..., 27-1, 2r. Noting that when 

these conditions are satisfied the fraction ade , tends to zero, we arrive at 
2,(N). 

the well-known result (cf. A. A. Markoff, Theory of Probability, 3rd Edition, 

pp. 329-330). 


The probability of the fulfilment of the inequality 





1 X : 
t,< = li = ae -m)| =a 
W Be 


: i . t, : " 
tends with increasing NV to the limit 5 [ e—“ dt, whatever be the law of dis- 
Tt, 


tribution of values of the variable X, if only it satisfies the condition, that 
Ny; : : a 
fi should tend with increasing WV to zero, when i =3, 4, 5,... 00, and if at 
[Np]? 
the same time the law of distribution of values remains unaltered, and the separate 
experiments are mutually independent*. 
(2) From (22) we find: 


Par,(N) wisi fe ues J 1) 47 -VDC-2) uit 
1.3.5... (2n—1) py" wy ¥} 2 3u? 9 pe 





Thus we see that pee tends the more slowly to 1.3.5...(2r—1), the more 
(NN) 
the law of distribution of values deviates from the Gauss-Laplace law, and the 


greater ris. If uy, >3,,%, then for sufficiently large values of J, 


fv.) 1.3.5 ...(2r —1). 
Me ,(N) 

Nw 
[Nue}il2 
is, as is well known, not necessary. From the form to which Liapounoff succeeded in reducing the 
condition (see Proc. Imp. Acad. Sci. vir series, vol. x11) it follows, among other consequences, that 


* The condition “ tends, with increasing N, to zero for i=3, 4, 5, ... ©,’ while sufficient, 


; A cae cd. ten 
the law of distribution of values of Xi) tends with increasing N to the Gaussian, if N = tends to 
2 
zero. Noting that 





Mu) _ gi) [ 4 -3] 
Ha?) Niu? J’ 


we see that this condition is satistied if a tends with increasing N to 3. It is in this way that 
N) 


Liapounoff’s results justify arguments based on the examination of the two moments only 43) and 

4(y) in deciding the question, whether the law of distribution of the values of X(y) tends to the 

Gaussian with increasing N or not. The assumption usually underlying such a procedure, viz. that 
be be 


2 
the law of distribution is the Gauss-Laplace law, if fr =0 and aaa is clearly inexact: the coincidence 
2 2° 


of the values of two or even more moments does not guarantee the identity of the laws, but merely 
compresses the possible divergence between them into limits which become narrower as the number of 
coinciding moments increases (cf. the investigation by Chebysheff ‘* On the Integral..., forming Approxi- 
mations to the Value of an Integral” in Oeuvres, t. u, and the related papers by A. A. Markoff; 
cf. also T. T. Stieltjes, ‘‘ Recherches sur les fractions continues” in dnnales de la faculté de Toulouse 
(1894). 


VOL. 2-— 1. 
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Vv 


Equations (15), (16), (17) and (18) hold for all laws of distribution of the 
variable X. If the law fulfils the conditions 


Piz, = 0 for +=0, 1, 2,...-7, 


then por4:,.v) = 9, for all the coefficients T%,4:,,-, then vanish. As regards the 
coefficients T,,.,-», they take the form: 


h 
Tv,7-22=1.8.5...(2r=—1) & ri O48) M5 pa 
i=1 
> oh, Poh, mee Pan’! . 
ha! ja! «-- Jy! (2a) 1P* [(2he) IP... [(2hy)! Pr 
where the last summation extends to all positive integer values j,, j,, ... jr, satis- 


fying the condition: j, +j.+...+j,=%, and to all integer values of hy, he, ... hy, 
satisfying the conditions : 





2<ch,<hy<...< hy, 





hy jy + baja +... Hhyjp = ht it. 
If the law of distribution of values of X also satisfies the conditions : 
fa = 1.3.5... (24-1) w,* for i =1, 2, 8, ... 7, 
then (cf. Introduction (5), (8) and (16)) 


h | 
Tw7-4= 1.3.5...(2r—1) 2 pi-h+i)] Qh+i yor x | 


f= 





5 (1-3-5 +. (Gh — DJ [1.8.5 ... (Zhy— WP... [1.3.5... Bhy— Py 
di! Ja! «++ Jp! [(2hy)!) ((2Aq) i}... [(2Ay) Pr 
h R 1 
= 2r— r > ri-Ato) re Tee cl ee. SS —— “as 
a al i” 1% Tai ccog al [hal] os [ple 
=1.3.5...(2r—1) uy’ a, a, 
and 


r-1 l t P 
Mary) = 1.3.5... (2r—1) mw,” 2 Wr 2, (— 1) a, ~i+n Br—itnn 


a 


=1.3.5 .(2r=1)ul We = 1.3.5... (2r—1) p/m: 

Thus in the case when the law of distribution of the values of X fori<r 
gives values of 4; answering the Gauss-Laplace law, then the values X,y) follow 
a law of distribution giving for yi) for ¢ <7 the same values as the law of Gauss. 
Consequently if the law of distribution of values of X is Gaussian, then the law of 
distribution of values of X (y) is Gaussian also for all values of N. 
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CHAPTER II 
I 
N. 
(1) Putting 7 Re ES gr apiece eer eeaen recat (1), 
we have: Ep’, = pr. 


Noting that »’, is the arithmetic mean of the mutually independent quantities 
(X,— my’, (X.—m,), ... (Xw— my’, 
and that E(X,- my =p,, 
while E (X, -— m,)" = pz,, 
we find from formulae (4) and (5) of Chapter I, when we put r = m, and replace 
M,,(w) by Ep’,” and ma, by par: 


™m 


‘im 1 mol 
Fpl" = Fm Biman NO = ar x Blom may NI 
= =0 





N™ h Ni 
Ge ke. os GUS fhe eee (2), 
= Tim >» i = (- | ae Bun-n; m—h—i Rim, m-n) 
tut h=0 
where 
Rin mh) = s mi h+i) pym—h-i > a Hayy" Mhyr” ale Mgr - 
i=1 jul jal «fy! Daal [hal ... [hy!]2r 


where the last summation extends to all positive integer values of j,, je, ... jy 
satisfying the condition: j, +j.+... +jy=t, and to all integer values of h,, he, ... hy, 
satisfying the conditions : 
2ehi<h<...<hy, 2 
hij i heje + eee + hy jy = h on 2. 

Hence : 

Rim,m) = Pr” \ 

Rim, my = Cm? py bar 

Rim, m2) = 80 n'a," Mer’ + Cn? fy psy (4), 

Ron, ms) = 15C,,$ p,m peo? oo 10C,,; fy Mey Mar + C4 pe," bar 


Ron, m—< — 105C,,° pyr Per" 5 105C,,’ a Por® Bsr + 15C,,° pr * Mer Mar 
5 10C,,° pr” Bsr* = 0,8 pe Msr J 


and, on the other hand 





Ron.) = Pmr 


m-1 
Rim, + $ > CA Phy B (m—h) r \ ; 
h=1 5). 
m-1 


Ron, i = 


_ 
h=i-1 


scene casuetenteesweention ( 
Pon Mim—h r Rasa| 
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Substituting in (2), we find: 


* ae ™m 1 an (m ee 1) m—2 ~<- 2 
Ey’; = fy +yV 2 Br [Mar br?) 


1 {mi m—4 2\2 = m—3 3 ) 
+ 7: 18 Por —* [Mar — oe? P + ee [Har — Spor flr + 2p" + ..- (6). 


(2) Hy’,” may also be calculated by means of recurrence formulae. Put 


EN" p/™=E | 2 (Xe * my" OORT (7). 
From relation (8) of the first chapter we find : 
9 =H one tS Cay Bt e8 hg. gh cneerescereen (8). 
Noting that ae = Nyu,, we find hence : 
00) = Nita + WO py? = N [par — pr?] + NV? 3 
oni = N pis + NU) Bye, by + NO pw? .- (9), 


Dye) = par + NEA [4ptay poy + Spey?) + NI) 6 pyr p-? + NI pt 





m 
and, in general, om) = > N-A Dp) ubsstictediecatiuterceus (10), 
9 te i=l ’ 
where 
(m) \ 
Dd, = Mmr 
(m) 
Pam = fy™ 
(m) _ 5? Ah (h) 
atl = 2 On Him—wrD, 4) (11), 
nt ee ee (m—2) (m-1) _ an m—2 (m—1) 
pO a (m 1) por Dina + Mr ee (m 1) par br + Br Dia- 
= On? May er? 
m—-2  __ ns? Co / aot 3c" 24, m—4 
—_- Mar sun m-1-j Pr Pim—s—j r m Par br 
and so on. 


Substituting in (10), we obtain 
y'm) =NF y’™ 


py se =] 71 —m) by” 4. Ni- (m0) Cn? fie py 


m-3 2 . 
+ Ni-in—4) |b =C ij Bry? bim-s—j r + 3C nt Meor* pont FH wee) oe (12). 


j=0 m= 


m mh A = 
+ N44 > jee Pan--h) r Phe + N mr 
h=1 
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(3) When r=2, in the case when the law of distribution is Gaussian, we find 
from (8): 


m-1 
y™ ies m 5 ih mal) Sees m—h (h) 
on Wit. 3.5...(2m—1)p, + 2 0,.,1.8.5.. (2m—2h—1)p, “v Sua 


e i ata =e. ; 
Hence noting that Cg © (NV — 1) p,, we find: 
gos 
son N (N + 2) w3, 


(3) 
Os wy = NV (MW + 2)(N + 4) pt, 
oe = N(N + 2)(N + 4) (N + 6) pet. 


Let us assume that for all values 7 < m, 


v  =N(N+2)(N+4)...(N + 2i— 2) pi. 


Yay) 
In this case : 


ot) — Nw i {1. 3. 5. .. (2m +1) 





2,(N) 


+ & Ophl.8.5... (2m —2h+1)(N—-1)(N +1)... (N+ 2h—3)}, 
h=1 


vow = Na” {1. 3.5... (2m —1) 
m—1 
+= 0% 1.3.5...(2m—2h—-1)(N-1)(N +1).. (V+ 2h-3)}, 
a 


and, consequently, 


(NV+ 2m) p06) = Nag .3.5...(2m —1)(2m+ 1) +(N—1)1.3.5...(2m—-1) 


+°3 Ch_,1.8.5...(2m—2h—1) (2m — 2h + 1)(N- 1)(N +1)... (V+ 2h 8) 


h=1 
m—1 

+"5 o 11.8.5... 2m — 2h— 1) (1) (W +1)...(+2h—D} 
h=1 m~ 


= Npytt)1.8.5... (2m +1) 


+ © Onh1.3.5...(2m—2h+1)(N-1)(N 41)... (V+2h-8)} 
h=1 


_ itd) 
i YN)" 
Thus when the law of distribution of values of X is Gaussian, 


Ep’? yy = ya W(N+2)(N +4) ... (N+ 2m — 2)" for m= 0,1, 2,3, ... 20 (18). 


Biometrika, x11 











162 Expectation of Moments of Frequency Distributions 


(4) When r =8, in the case of a Gaussian distribution of the values of X, 


\ 


(m) m—-1 h (h) 
=N {hn + 2 ae 43 (m—h) "3(N-0| ° 


Us (N) 


Noting that’ Uy iw) = 9, if the distribution of the values of X be Gaussian, 


we find: 
(2) ‘ ‘ 
wma. 1.3.5.2, 
are . cx (i+1) _ 
V3 Ny) = 0 and, in general, ent 0, 


ov” = Nuf1.3.5.9(5N + 72], 


3,(N) 


Us xy = Nu? 3.5.9. 15 [25N? + 1080N + 15912], 
When r=4, for the case of a Gaussian distribution of the values of X, we 
have: 


h 
m-1 


m—1 
* = 41.8.5... (Am—1) 03" + = O° _1.8.5...(4m—4h—-1) yp 
h=1 


2(m—h) (A) 
4,(N) )\? 


2 "W-1 


veu-p™ (N -—1)4,=3(N —- 1) pw, 


Oe = Ns 8([3N +32], 


Oey = Nut 27[N? + 32N + 352]. 


II 


Replacing r by m and py) by H(w',—p,)™ in formulae (22) and (28) of 
h 
Chapter I, and replacing yw, by = (— 1)* C)* w,* wa_», and putting 
k=0 


h 
2, (— 1) Oh* w* wawr = Xn, 
we find: 
E (p> — py" = 1.3.5... (2m—1) | ye x," 


1 : < 
+ yan [Am] x." »F vs daul-9) Xx, "-3 DB ‘oo tml] X;") 
Nw a mi) X."-3 X_ + gm! Xo" X, X54 ym) X"— X2> (14). 


~vy m-—l 
+ yu mil D Fe Xx; X os 12 


Amie) 
iis 





mi) A ».@ s ake mi-*) X.m-* X;! 


3m = 1 
24 








mi-3| >. Fe X; + ~ mi) x] + A | 
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E(u’, — p,m = 1.3.5... . (2m + 1) \ a 
1 a 


7 Nouv som) agen xX; il pgm) is X,X, 


tm X,"— X, 


m 
+ gym Xo X3- ra mI Xe X. | 


1 g 
: Nints ato m3) Xam—* X, + ato mi-) Xm X;X, 


+ qhy ml ¥,"-1 XX, } (15), 
=F init X24 tami) Xym-4 XX 3 








+ qigm) X." 9X2 X,— 








“0 
+ gm X,"-* X22 X,— S—De=> mi X,"-3 X,X, 
+ qagg mm!) X"—" X55 
by (m—1)(m—2)(m—4) re m (38m —1) y 
162 ESS ee he +. 
Hence: 
E (My — Br)? = NV i Hy”) 
E(w’, - br = [ose — 8 per by + 2y,*] (16). 


E (u'— w= 3 a3 [Mer — Mr?) + yilte— Asptar Por — Bpy® + 12 pboy poy? — 6pry*] 


III 
(1) ape that - 


N N 
Ey',, Mn = wt & (« s—m) te + TZ (a — mM)" (a; — my} 
\i=1 i=] ji 
1 
e rr LM prytry + NE pr, berg} = ry Mra + 57 [Mra — Hrs Mra 
we find: 
, , yy 7 , 1 
E(u ee br,) (u ied ry) - Ep "1 lad Br, br, = N [Pry tre a Bry Hr]. 
Similarly we find : 

, , , 1 ao 

Ep 1 Pre bh rs = N; {Nery tratrs + ve [orytre Hrs + Pry+rs Pre > Pre+rs Bry] 


+ NU py, pry bry} 
= Mr, bry Mr, + N atx, +r Pry + Prytrs Mre + Prztrs Bry — Bur, Myg Hrs] 


+ ¥ [pn +r9t75 = (Mry+r, Mr, + Mritrs Ur, + Bretrs Mr,) i 2 pr, Mr, Mrs], 
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E(u’, Fs Hr,) (u',, ig Hrs) (M's, Ke Hrs) 


1 
= ya les tretis — Matra Mery — Merytirg Ming — Meratrg bry + 2ptry Hy berg) 


» 8? , , , 1 
Eu "phigh hig = Ns {Nr tretrstre 
+ Ni [Mrytratry Berg F Merytrytry Bry + Miytigtry Mire + Pratrgt's My,] 
+ NVC [Mrs +1, Prgtr, + Prytrs Pretry + Pry+1, Pretrg] 


+ Ni- [ory +r, Py Pig t Mry+rs Pre Bry + Prytry Bry br, + Mretrs Pry big 


i retry Bry Pre + Prstry Bry br, ] 
a Nr-4 Br, My, brs My,} 


1 
= Pry Mery Pry Bry + V lent Berg org + Mery try Mery org + Pirytiry Pry Merry + erg try Mr, borg 
+ Brytrg Bry Mrs + Mrgtry Pr, Hr, 
1 
+ WN (Mrytrytrs br, - Mrytretrs Mrs + br y+rst+1y Pry + Protrstrs Mr, + Pryt+re Prst+r, 
+ Mrytrs Protrs +r +14 Mra+rs — 3 (Hrs+r Hrs Mr, + Prytrs Pre Pry + Prytre Pre brs 


F Mrretrg Bary Berg + Brag Hery Bory + Margery Pry berg) + UU ptr, pry berg Mra] 
1 
— W3 [Mrs +rtratr, ay (Mry+ratrs Py, 6 Prytretry Brg so Prytig+ry Pra — Protra+irg Pry 


= = Prytre Prytry + Prytrs Mretry + Piy+rg Mrs+r5) +2 (Mr +r, Bry bry + Prytrs Pre bry 
+ Perytrg Pry Pry + Prytrs Bry Pry + Prstry Pry Pry + Pergtry Pry Mr.) Be 6h, Pr, brs Br), 


E(u’, = My,) (u's, Si My) (u'r, ry brs) (H'r4 i Hr,) 
| eee 
= N? {(Mry+19 Mrs+rs 4 My +13 Mrotig Ss Pry+iq bra+ry) 7 (Ai, +r9 brs Br, + Pyy+r5 by, bry, 


F Mrytry Pry Berg “+ Margery Mery Mary + Mrgtrg Mary Mery + Prgtrg Bory Mrrg) + Bptry Pry bry Pr, 


l 


+ N32 UMrstietigtte ba (Pr tre+rs by, 3 Prytret+re Br, + Mrytrstrs by, 5 Pro+rstt, Hr,) 
_ 


= (Mry+rq Prrytry + Mrytis Migtry + Mey try Mire+rg) 

$2 (Meg try Mr berg + Mery try Mg rg F Prey Marg blr, + perytrg Mery bry + retry bry bry 
+ Mrgtry Mery Brg) — Opry Mer, Bory berg): 

The coefficient of 1/N? in 


B (Bn, = pry) Ming — Brg) (rg — berg) (Mry — berg) 
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can be expressed in the form 


(pr, +1, — Pry, br, ] [Mrstry — Pr Hr, | a5 [Mr +15 — Pr, brs] [uretr, — Pr, Hr, | 
+: [Mrptrs — Pr, Hrs] [41,415 — Pr, rs] 
= [Ez (p's, a br,) (p'r, = r,)] . [Ez (H'r5 = br) (Hr = Mr.) | 
+ [BE (ws, — bry) (H’rg — bry) - [LE (erg — bry) (Hr -- Ber,)] 
+[ E(u, — Bry) (Hy — Br) )- [EB (Hy — Hr,) (M ry — Hr) |*- 
(2) In the general case, putting 
Br, Bre oe Pye 0 (i)> 
Ky, Bry see Mg = II", 
T+. +...+ 7 =T7, 
and agrecing to denote 
t-h+1 i~h+2 i-h+3 i 


> 2 oo a 
A=1 frx=fitl fr=fitl Sh=Si-- 141 S.(h) 


we find: 
AT} 1 : (i) 1 i i) 
El’ =,, > NAA == = Neen H! =H” 
ne 1 i Nt j=0 i,7 
sg 1 ] 
+ S52 (DEB eH see: 
Here, ry = II (i) 


and hie may, when 27 <7, be written in the form 
j a. 
W4° 3 2 hoe 
Po aen GP! Mg, Berg, + Porgy 
and, when 2) > 7, in the form 
H® = > s KR Tw 
an WATT. ae oe 
where xr, - 





Pry oe Brgy 

Pry tr gt tt psy 

and Ky, denotes the sum of all products of / factors of type pn, Ming --- May 
possible, subject to the condition that /,, 2, ... hj appear in their correct order, in 
sums of not less than two terms taken from the numbers 7;,, 77,, ... Th...) and that 
the number of summands composing /, is not greater than the number composing 
h,,,. We have in this manner: 





g? 2 £28 ecto 
i,i-—1 ie i rts 
A=) fr=f,+1 Pry Pry 
~~ ‘Z S Thy 
is-4 4 Bry+rg,tf, a 
, AAVW=AHitl ffi Berg, Berg, Berg, 
#-8 i-2 i-1 


+z & = S his aro Pere are ee Mice te 
asPuicanaen" Ib TONG GEN 





+ Mey ory, bry try, =e 

+r vet+r 7 
THI Ia Ss Bory, berg, Pry, Bry,” 

* Cf. Soper, “On the Probable Error of the Correlation Coefficient to a Second Approximation,” 
p. 97 (Biometrika, Vol. 1x). 
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a I 
i,i-3 bre +rg.tryg tr 
4-8 4) Mery, Pry, Berg, Mary EE ITE 
II 
+ > Core try Bry try + + bey sry Mey try ary + +++] 
S, (5) Bry, Pry Pry, Pry, Pry, *h, "fe "Ss Ls7"Ss Sits SoG Ss 


" Iw 
S. (8) Pry, Porg, Pry, Pry bry, Pry, 





[ory ny, Mrytry, Mrytry, + bry try, Brg try, Pry try, t+ -] 


and, on the other hand 


(3) 
H; 1 > Prytret...tri = Pro 


i * i-1 2 
Hyg 2 pry rang t ZS Mrgctrg brmtrg tg) + + 
2i-k+1 2—k+2 2 
+ = » eee > Pop +1G +475, Brig t175,4475,) +... 


A=1 fe=jit1 Se=Se-1+1 
i+1 7+2 2 


+ 2 = Si = Ber 5. +05+...475, Pr—lrg +13, 4+-.475 1? 
j=l fo=iyt jemhi-r ti S52 Ji IG: Ji 


(Qi+1) 2i+1 3 2i+1 
Ass; . x Bry Mr—rg + = = Mery +15, Pr—ir5,+75,] * 
? j=1 Ri=1 jo=i,+1 5 
44+2 i+38 2i+1 
+ % z tee 2 Berg +15,+-405, Pre[ej try +47 gl 
A=1 jo=jitl Ji=hi-.41 i 


In the case in which 7,=7,=7,=... = 1;, the coefficients x become Ry, 5 


and we get again formulae (2), (4) and (5) of the present chapter. 
(3) Let us agree to denote 
E(w’, = br,) (Hrs - Mr) ee (p's; = Hr) by Ew dy. 
We have: 
By dp = EM’ - RF {TT (y/H'r;,1 * 2 Pr; Pry, E (0 o/b'rs, M's5)} “4 
j, 
HUES pny tng Hayy BAM gy Mego Mig) + 
js 
+(—i7".. 2) Pry, Prj, +++ Brg, K {N/ Brg, Mss, ihe, Bry.) 
5.G- 


+ (—1)° @-1) T1q. 





Using here the values found above for ETI’), £ {T,,/ rs}, and so on, and denoting 


by Ki"A'" "ha the sum of all products of J factors of type pa, pr,-..4% 
possible, subject to the conditions that h,, he, ... hy appear in order, that the sums 
contain not less than two summands chosen from the numbers 1, 72, ... %_,4,,, and 
that the number of summands in fh; is not less than in /,,,, we find that the 
coefficient of 1/N* in the development of E'; du equals: 
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2i-t-1 t-1 
(=D! Hey 2-1 OP Bane + & (UP Ba ttt 


t—k (or 2i-¢+k) | Re 
x > > (2i) Kf Th aa) 
t=] At k+D) Bry Pry Pry 
t-1 
- = py = (—1) Bo eiee 


~k (or ¢-1- II 
t—k (or %—1-t+k) - 2 |My, 





Kis, 9M fas os Thay) 





2 
t=1 S,.¢-k+-D Bry, Prj,° ee Pie 


t-1 
—1¥ 8. 
+4. Br;, Pr; =, ( 1) Boit4n—2,% ih 
a ae) ll Pees > 
t vols “ig t+k) S (20 [My Mery, 


x Kft" haw) +... 





l=1 S, (t-k+l) Bory, Pryys** Berg 


t-1 
+(-1) x Hs, Benjy s +> Bey, 2 1)* Boit+t—n,k 
i. - 


t-k (or %-h-t+k) Th tan | Phy, Pry ++ br 
5: = 2 4 =“ - Kt" had +... 
t=} A (¢-—k+1) Pry, Pig? **Fg 





t-1 
-(— } > Aileen S(-1 
’ ji @int-1) Th Pig ig. aX ¥ Besse 
Tl e/ 
t—k k (2i)/ Pp, Py, +++ By, 
S +1) s Tg, Te "Io4-t-1 Kft". ee 
bal f(t k+l) Bry Bry, 2+ 


x 





"hes ) 
where the summation for / extends to all positive integer values of / from 1 to the 
least of the upper limits. 


By the aid of simple transformations the coefficient of 1/N* in the development 
of E.., can be expressed in the form: 


2i-t-1 
E (- 1) Il (2%) z (- 1) C, Bei-n,t 


h=0 

t-1 t—k 7 ae 

> (- 1¥ 23 ama Oo (2%) Kf&n Nfs i *Y, 543) 
k=Ent. (¢/2) l=1 f,(t-k+D Pry Pry, ee oe 


2i-t-1 
x D (1) Chg tye 1 Botte ne 


h=0 
Ent. (¢/2)—1 k+1 Tl : : 
+ & €1P 2 - Kft "tq hay) ...(18). 
k=0 t=1 f, (t-k+1) Pry, . a 


2i-t-1 
x =. (— 1)" C'o_tsn-1 Boitpe-h,k 





Ent. (¢/2)-1 t-2k-2 ai 
r > (- 1} > (2i) Kin ’ Thy ove Tyo _oy-j) 
k=0 j=0 f, (2t-2k—-j) Pry, os a 


Qi-2t+2k+j 


x 2 (-1y Choos 0b4j Boi-t4b—h,k 
h=0 / 
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Noticing that (cf. Introduction, II, 3) 
r-k-1 
>> (- iy C,.* Bent =0 when m> 2k, 
h=0 


r-k-1 
S (-1) Cy" Bane =1.3.5...(2k—1), 
h=0 


e=ee8 n oie 
=, & 1) On» Byte = 2 Bi,g oe Pe ? 
= g= 
where B,, has the value given in the Introduction (12), (13) and (14), we see that 
the coefficient of 1/N* in the development of EZ) dy is zero, if t<7%. The coefficient 


of 1/N* equals 
i-1 
(—1)' Hep 1.8.5...(2%—1) + = (—1)1.3.5...(Qk—1) 
k=1 
TI gi 


yo SS ME” 4M ....(19), 
Fr (2-28) Bry Bangs Biry 2,2, ...2 2,2,...2 





where by . is denoted the sum of all the different products of type 


Mh, hy ++» Hh, possible, subject to the condition that /,, hy,... hi», appear as 


sums of pairs of the numbers 77, 17, «.+ Tyg_g- 


When 7, =r, =... =13;, the aggregate of terms denoted by M ae reduces to 
(2i — 2k — 1) (21 - 2k —3)...5. 3.1 pp", 
and 


1.3.5...(@%%-1). = en ye | 


» (2i—2k) Pry, Pry, vee “ 
| 3 5 . 2i-2k 2 sy> ’ ‘ i-k 
=1.3.5...(2k—1) C%-™ yu (25 — 2k -1)(2i- 2k-8)...5.3.1 pe 
=1.3.5...(23—1) 05" pw pr. 
The coefficient of 1/N*‘ consequently becomes (cf. (14) above) : 
1.3.5... (26 —1) (per — psy’ 


Similarly, we find that the coefficient of 1/N* in the development of Hei, du 


reduces to 
. 








2i-¢t 
(— 1) TT eta = (- 1p Chey: Bei+i—n,t 
=0 
- t-k of 0 . " 
+ = —1 > z (2t+1) K £09 haa) 
k= Ent. (¢/2) Tr S(t- kD Pry Bry ee Pry 
2i-¢ 
x = (— 1)" CY sig tpe-1 Boinr—t4e—h,k 
h=0 
Ent. (¢/2)-1 k ie 
/ (2i+1) \ 
+ = ‘—1¥ > = - Kin Tfgr-Th 14)  \ «e(20). 
k=0 l=1 f,(t-k+1) Pry, By, oss ~ | 
2i-t 
x = (— 1)" Chas tye-1 Boisstren, & 
=0 » 
Ent. (#/2)-1 t—2k-1 I et 
+ & (-1F & = mn ED AG hy Thay og 9) 
k=0 j=9 J,(Q2t-2k-j) Pry Pago? Pig, on, 
2i—2t+2k+j+1 . 
x 2 —-1¥C “sit +ek+j+1 Sei-t4k—h+i, k 





h=0 } 
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When ¢ <i the coefficient of 1/N* in the development of Eg:,. du consequently 
reduces to zero. When t=7i +1, noting that 


i-1 
RES Cy, Buen r= = (—1) Ch, Bise-a,e=1.3.5...(2k—1), 


t-1 1 x 

RG baer Burne= = Byg Cf ,,=1.38.5...(2k-1) {((i-k +1) +3 (k-1)} 
2k—-1 

= 2,- 1 a Bisk-n,k: 


we find: 





(— 1¥11.3.5... (2541) 8% 


fl (2i+1) 





2k+2 


+ ° > a Be 
+ 2 (-1)1.8.5...(08-3)) = ——Heeo yoo 
k=1 ly, (2é-2k +1) Bry Piz, tee ee ok eg 
: R TT es = 
+[((@-k +1) +3(k-D)] wey as 
I, (2i -2k+2) Bry, Pry se Pry. Neb ' 


* (21), 








(i) 
s M, 2, ... 2, 3 





/ 
(i-k+1) : (i-k) 
where M,, ,, has the value. given above, and M 22.9.3 denotes an analogous 


sum of terms of type wn; “a, --- #n;_, With the single difference that in the aggre- 
gate h,_, there occur not two but three of the numbers ry,, 77, ... T5924,» While 
the remaining 2i — 24 —2 numbers of this series are distributed in pairs among 
AS Ras eee Aes. 


In the case 7; =17,=... =1si4,, the coefficient of 1/N‘*! in the development of 
E eis: du becomes : 


(— 1) 1.8.5...(26 +1) Ri we + (— 1)1.8:5...(25— 1) [1 + 8-1] CR, eg ae 


i-1 se es re : 
+ > (-1F1.8.5...(2h—1) {Ong a ws, * ee 1.8.8... oD 
k=1 
+[—k4+1)4+$(R- VD) Cy we wy 1.38.5... (Db — We + vf 


+ pe, ber Ce,,,1.8.5...(25— 8) 


* a. v aAES a 
=1.3.5...(204+1) , [por — wy? )* . [Mae — Boop oy + 2,7]. 


2 
” 


(Cf. (15) above.) 


(To be continued.) 








MISCELLANEA. 


I. Preliminary Note on the Association of Steadiness and Kapidity 
of Hand with Artistic Capacity. 


By M. L. TILDESLEY, 
Crewdson-Benington Student, University College, London. 


(1) This preliminary note is based on observations made by Miss M. Dalgliesh at the request 
of Professor K. Pearson. As a teacher of drawing in a large school, she had a long experience of the 
amount of artistic imagination and artistic craft in her pupils, and was able to obtain appreciations 
of their other abilities. The categories she supplied for about 60 pupils were: (a) their ages*, 
(6) the number of years during which they had learnt drawingf, (c) their artistic or non-artistic 
capacity, the former being subdivided into imagination and craft, (d) their mathematical, and 
(e) their musical ability. The steadiness and rapidity of hand were to be tested by the well-known 
““maze”-problem. Three mazes were prepared, I, II, ITI, of varying degree of tortuosity. The nature 
of the problem was explained to the pupils. They were to enter the maze at A and leave at Q, a 
continuous pencil track being drawn from the point of entry to the point of exit. The performance 
was to be considered the more excellent the fewer the occasions on which the pencil track touched 
the boundaries of the maze path—such touching being termed a “bump.” The ideal pencil track 
would keep steadily in the mid-path and parallel to its borders. No distinction however in this 
preliminary experiment was made between a non-bumping wavy line and an ideal track. The 
efficiency due to keeping clear of the boundaries was simply determined by the number of bumps. 
The minimum number of bumps of any girl in any one maze was one and the maximum 72. 
Further, the performance was to be considered the more satisfactory the greater the celerity with 
which the track was completed. The minimum time (taken with a stop-watch) of any pupil in 
any one of the three mazes was 18 secs. and the maximum time practically 3 mins. Contrary to 
what might by some ke anticipated, there was not a high negative correlation between the number 
of bumps and the time taken. Although on the one hand an over-hasty temperament might lead 
to many bumps, on the other a certain celerity tends to straightness of path while hesitation leads 
to bumping. These points will be more easily grasped from the correlation results provided below. 


(2) A question arises as to the relative difficulty of the three mazes measured in time and 
number of bumps, the average values are: 
Maze I Maze II Maze III 
Average number of minutes taken 2-002+ -043 1-208 + -031 1-391 + -037 
Average number of bumps made 20-68 + 1-297 15:39. + -927 31-82 + 1-392 
These numbers, however, can hardly be taken as measuring the absolute difficulty of the three 


mazes, for (i) they are not of equal length, and (ii) they have not the same number of changes of 
direction. Approximately the following hold: 


Maze I Maze II Maze IIT 
Length of mid-path ... ee ae -- 1025mm. 700mm. 730 mm. 
Number of changes of direction _... men 94 84 


* Their mean ages was 14-43 years with a standard deviation of 2-07 years, the actual range being 
from 10 to 18 years. 

t The mean number of years during which the pupil had learnt drawing was 3-84 with a standard 
deviation of 2-20, the actual range being from | to 8 years. 
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and thus we have: 
Maze I Maze II Maze III 
Number of cms. described per minute --- 51-20 57-95 52-48 
Number of bumps per ten changes of direction 1-616 1-637 3-788 


Thus judged by time whether absolutely or relatively to its length Maze II was the easiest, 
Maze I the hardest. But against this must be set the fact that Maze I was taken first, and probably 
the pupils in this case proceeded with more caution. Absolutely and relatively to the number of 
changes of direction Maze III was the hardest. Maze I does not seem to have been harder than 
Maze II, if we judge its mean number of bumps relative to the number of changes of direction. 


Our total numbers being few it seemed at first desirable in order to reduce the probable errors 
of our results to treat each trial as an independent event and thus reach a total of over 150 cases. 
This possibility is, however, excluded by the difference in difficulty of the mazes; pooling would 
have produced spurious correlation. We were thus compelled to work out correlations for each 
maze, or to pool the total achievement of each pupil. We have sometimes adopted one and some- 
times the other method. 


To obtain a single general standard of efficiency in maze description we have taken as a com- 
bined measure the inverse product of the time taken and the number of bumps made. We shall 
speak of this as the “inverse product” simply. It receives some sort of justification, when we note 
that the factors are not highly correlated and further when we note that we desire a measure which 
shall increase with efficiency. 


The fundamental problem we had in view is the following: To what extent are steadiness and 
rapidity of hand as exhibited in maze-tracing the result of training? to what extent are they innate? 

Before proceeding to the discussion of this problem we may note the variability in period of 
time and in number of bumps for the three mazes. 





Maze I Maze II Maze ITI 
| | 





C. of V. 











| gs 
S.D. C. of V. | S.D. C. of V. S.D. 
| 


| Time in maze...) -479+-031| 23-9141-61 | -340+4-022 | 28-15+1-93 -410+ -026| 29-48+ 2-04 
| No. of bumps...| 14-39 +-92 | 69-59+6-22 | 10-29 +-66 | 66:85+5-86 | 15-44 +-98 48-544 3-75 














These. results seem to indicate a conformity with the general law that the harder the test the 
greater is the scatter*, i.e. the weak fail more conspicuously and the able succeed more markedly. 
This is a law manifested in most stiff competitive examinations, or again in the difficulty of making 
marked distinctions in the case of easy papers. We are speaking here of the scatter or variability 
as measured absolutely by the standard deviation. It is noteworthy that the re/ative variability 
(or the variability as percentage of the mean value) as measured by the coefficient of variation 
appears in the case of the bumps to be less in the case of the harder maze. We are thus driven to 
the conclusion that the emphasis of the difference between the ineffectual and effectual in a given 
task while increasing with the stiffness of the task‘does not increase proportionally to that stiffness, 
but probably at some lesser rate. 


(3) The first problem to be answered is: How far is steadiness of hand an individual character- 
istic at all? Will the same individual do well in one maze and badly in another? The answer to 
this problem lies in the pupils’ correlation in efficiency in performances in different mazes. Now 
whether the characteristic be acquired by training or be innate we should anticipate a change with 
age. Most innate characteristics grow stronger or weaker with age, and this must be taken into 
account. The following table gives the chief age correlations: 


* The high variabilities in the case of Maze I are we think due to the manner in which different 
individuals attempted .a novel task. 
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TABLE I. 
Correlations with Age. 

















Characters Maze I | Maze IT Maze III 

Boy a a Paes ee nee Pe: ee 

‘Se ieee ..  . «| -aaee | aes aes | - ooo Soe | 

| Time taken and age ee a -. | — 241 + -085 + 003 + -090 | + -152 + -088 | 
Inverse product and age eO + 402+ ‘076 | + -317+ -081 hey -467 + -071 

| No. of bumps and age for constant time | - -607 + -057 | - -539+ -064 , - -542+ -064 | 

taken | 

| Time taken and age for No. of bumps | - -336+ -080 | - -101+4-089 | - -112+ -089 

constant | 





This table shows at once a very considerable relationship between age and the number of 
bumps made: the steadiness of hand increases with age. On the other hand in the case of Mazes II 
and IIT no relationship between age and time taken was demonstrated ; in Maze I, however, there 
was possibly a slight relationship between time taken and age, of the opposite sense, however, to 
the insignificant values in Mazes IT and III, i.e. the lower the age the longer the time taken; this 
is not improbably due to the novelty factor involved in Maze I. On the whole we may reasonably 
conclude that the relation between time taken and age is not important. Confirmation of this 
arises in the case of the inverse product measure of efficiency and age. The efficiency increases 
with age, but because it includes time is not so marked as in the factor of bumps alone. The two 
remaining correlations indicate what happens if we eliminate respectively the influence of time 
taken and number of bumps. We hardly improve the relation between the number of bumps and 
age, if we make the time taken constant. On the other hand we get one significant but small 
correlation and two insignificant correlations, but all three are now of the same sign if we measure 
the relation between time taken and age for constant number of bumps. It is thus possible that 
there is a very slight relation between time taken and age—rapidity slightly increasing with age 
for a given degree of steadiness of hand. This leads us to the direct problem of the relationship 
between rapidity and steadiness of hand. 


TABLE II. 


Correlations of Rapidity and Steadiness. 








| Maze I | MazeII | MazelIlIl 
BPMs Ay ae eres 1 
| No. of bumps and time taken ... nas ... | - 052+ 090, - 165 -+ -088} -— -432 + -073 | 
No. of bumps and time taken for constant age | — -246 + -085: — -193 + -087| - -422 + -074| 
No. of bumps and time taken for time learnt | - -070+ -090 — -164 + -088| - -444+ 072 | 


constant | 





Without regard to age, it is only in the case of Maze III that we can assert that the number of 
bumps increases inversely with the time taken. Allowing for age the associations are more marked, 
but by no means as intense as we had anticipated. Further they seem to be dependent on the 
difficulty of the maze—i.e. the harder the maze the closer the relationship. A privri one might 
imagine that a slow transit wouid escape bumps—it is so, but not in a very emphatic manner. We 
suggest that a certain degree of rapidity is really helpful in avoiding bumps; it keeps a straight 
course in the st -aighter parts of the mazes, while it is rapidity at the angles which is calculated to 
produce bumps. There are probavly therefore two factors at work. 


We can now turn to the question of individuality in maze-tracing. We find the following 
correlations: 
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TABLE III. 
Individuality in Maze-Description. 

Correlations Mazes I and II | ards and IT] {Mazes IT and m 

| No. of bumps et Bes Ae As we | + °754 + 039) + -761 + 029 + ‘876 + -021| 
| Time taken “se cs ae eek eee | pe 594 + -058 | + -436 + -073| + -807 + -030| 
Inverse product... = ss ie --- | + °703 + -046/ + -661 + -051 | + -695 + -047 | 
eae oe | | 
No. of bumps, age constant ahs aes my “644 + -053 | + -648 + -052| + -826 + -029) 
Time taken, age constant BS ac -+ | + °613 + -056| + -493 + -068| + -816 + -030/ 
Inverse product, age constant ... Sos Sex | + 663 + -051 | + 585 + ‘059 | + -652 + 052 | 





These are very noteworthy correlations and it is evident that there is a very marked degree of 
individuality in maze performances, whether we judge steadiness, rapidity or the combination of 
both involved in the inverse product measure of efficiency. These high correlations it is true are 
lessened, but still large, if we allow for age*. The existence of this marked individuality brings us 
then to our main problem. Are steadiness and rapidity of hand—which to an appreciable extent 
increase during childhood—products of training, i.e. of environment, or innate characters develop- 
ing with age? 

(4) How far does the length of time during which drawing has been learnt influence the 
rapidity and steadiness of hand in maze-tracing? The correlations are provided in the accompany- 
ing table. 


TABLE IV. 
Influence of Time Learnt on Steadiness and Rapidity of Hand. 

















Correlations Maze I | Maze II Maze III | 

Mo, iiisiteies nc an 0x | ~ -216 + -086| - -286 + -083| - -209 + -086 
Time taken and time learnt - O77 + 090 | + -029 + -090| — -010 +- -090 
287 + -083 











| eee eee | 

| Inverse ee and time learnt = ... | + 151 + -088/ + -272 + -084| 
|- 2 | 
| 


| Time taken and time learnt for constant age ... | + -053 + -090| + -032 090 | 102 + -089 | 
| Inverse product and time learnt for constant age | — -066 + -090) + -137 + -088| + -067 + -090 


H 


nal 
No. of bumps and time learns for constant age | + +132 + -089| - -008 + -090/| + -089 + -089 
+ 





Thus while there is no sensible correlation between rapidity of hand and time during which 
drawing has been learnt, the small amount of correlation between steadiness of hand and time 
learnt disappears when we take these correlations for constant age. As far as this material is 
concerned, we see that steadiness and rapidity of hand are not the result of drawing practice, 
but are probably innate characteristics developing with age. This result is so important that it 
needs of course independent verification, but if true its suggestiveness is great. For crafts in which 
these characteristics are essential, they can better be obtained by selection than by training. 


We have endeavoured to throw further light on this point by approaching the subject from 
other standpoints. The correlation between age and time learnt is + -5051 + -0671, and it may be 
argued that time learnt in the early years of life is not of very great importance, and that this 
possibly accounts for the correlation being rather low. We accordingly confined our attention to 
the 33 children who did not learn drawing before 10 vears of age. The correlation of age and time 
learnt now rises to + ‘8555 + -03157. We then dealt only with steadiness of hand, and took as 

* The correlations of times taken are increased not lessened, but we have already drawn attention to 
some irregularity in the relationship of age and time taken. 


¢ This is only very slightly lowered if we confine our attention to children who make the same total 
number of bumps, i.e. age and time learnt for constant steadiness of hand is +-8254+-0374. 
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our measure the inverse of the number of bumps made on the three mazes added together. We 
found steadiness of hand (i.e. inverse of total bumps) and time learnt now had the significant 
correlation of + -4030 + -0755, while age and steadiness of hand gave a correlation of 
+ -4877 + -0895. 
We now corrected these last two correlations for age and time learnt respectively and found: 
steadiness of hand and time learnt for constant age — -0034 + -1174, age and steadiness of hand 
for time learnt constant + -3015 + -1067. There is thus: 
(a) no relation between steadiness of hand and time during which drawing has been-learnt, if 
we correct for age; 
(4) a definite relation between steadiness of hand and age, if we correct for length of time 
drawing has been learnt. 
The corresponding correlations in the case of the whole population under discussion were: 
Steadiness of hand and time learnt for constant age + -0334 + -0900, 
Age and steadiness of hand for time learnt constant + -3731 + -0776, 


in sensible agreement with those for the special population who had not begun drawing before 
10 years, although in the latter case the correlation of age and time learnt was + -8555 as against 
+ ‘5051 for the general population. 

As far as these results go they confirm the view that steadiness of hand is an innate character 
developing with age, but having little association with training in drawing. 

(5) We now turn to “craft” and “imagination” as factors in drawing ability. If these be 
correlated with efficiency in maze-tracing, it will not necessarily follow that efficiency in maze- 
tracing is associated with effective drawing training, as apart from length of training. Possibly 
craft and imagination in drawing are themselves in the first place innate characters, developing 
no doubt with age, but not necessarily intimately associated with time during which training has 
been given. 

In dealing with “imagination” and “‘craft” the method of “biserial-r” was adopted. Poor 
craft contains the classes ““minus,” very bad and bad, and good craft, the classes medium, good 
and very good. Poor imagination contains the classes minus, very bad, bad and medium, and 
good imagination the classes good and very good*. 


We have first to note the influences of age and time learnt on craft and imagination. 




















TABLE V. 
Influence of Time Learnt and Age on Craft and Imagination. 
Character pair | Value of correlation 
—— ——— —___—__—— — —_—_—_—_—_——_| ———— 
Good imagination and age Ses ne | - +479 + 094 
Good imagination and time learnt | | + -033 + -090 
Good craft and age ” | — -096 + -112 
Good craft and time learnt | + -166 + -114 
Good imagination and age for constant time learnt | — -675 + [-060]+ 
Good imagination and time learnt for constant age | + +363 + [-078] 
| Good craft and age for constant time learnt — 217 + [-086] 
Good craft and time learft for constant age + -261 + [-083] 





Now the absolute correlations are extremely interesting, there is no relation between time 
learnt and either imagination or craft; these factors of drawing capacity appear like steadiness and 
* The choice of series was made solely with a view to obtaining not too small frequency in the smaller 


series. 
+ Probable errors of these partial correlations are given as rough estimates for they are calculated on 
the basis of all the component correlations having been found by the product-moment method. 
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rapidity of hand to be innate. The influence of age on craft is not significant, but age seems to 
weaken imagination, i.e. the younger children are more imaginative in their drawing work. If we 
turn to the partial correlations we see, however, the bearing of these results. For a constant age 
imagination is moderately influenced by the time learnt; but for a constant amount of training it 
depreciates more markedly with age. The result is that there is no apparent relation of imagination 
to training. The diminution of the innate character with age is really more influential than its 
growth with training. Again for constant age there is a very moderate influence of training on 
craftsmanship, but for a constant time learnt craft diminishes with age*. The result again is 
that innate change with age masters training, and unless training is persistent, good craft will 
lessen, so that the correlation with age is either negative or insensible. The small influence of 
time learnt on these factors of drawing efficiency is remarkable, and it is highly suggestive to see 
that in certain characters training may only suffice to prevent deterioration, and does not provide 
a marked expansion of efficiency. 

We may now turn to the influence of imagination and craft on the steadiness and rapidity of 
hand exhibited in maze-tracing. 








TABLE VI. 

Influence of Craft and Imagination on Steadiness and Rapidity of Hand. 
| Characters Maze I Maze IT Maze III 
= = a ee ee ee Sa 
| Good imagination and no. of bumps + 239+ -109 | + -187+ -109 + +252 + -109 
| Good imagination and time taken ... | + 005 + -115 + -029 + -114 + 167+ -112 | 
| Good craft and no. of bumps - 163 + -113 - 109+ -114 - 029+ -115 | 

Good craft and time taken + -500 + -093 + 1044-114 — +308 + -107 





Good imagination and no. of bumps for | — -059 + [-090]t} — -092 + [-089] | - -014 + [-090] 
constant age | 
Good imagination and time'taken forcon- | — :130 + [-089] | + -035 + [-090] | + -277 + [-083] 








stant age : 
Good craft and no. of bumps for constant | - :272 + [-084] | — :189 + [-087] | - -110 + [-089] 
age 
| Good craft and time taken for constant | + -494 + [-068] | + -104 + [-089] | - -298 + [-082] 
age 








Now these results are of much interest and suggestive for further inquiry as soon as we are zble 
to deal with much larger numbers. In the correlation uncorrected for age there would appear to be 
slight relation between good imagination and a large number of bumps, but it is only because the 
younger students are more imaginative. Corrected for age the correlations are all reversed, but 
are seen to be of no significance. Good imagination and time taken cannot be considered to have 
significant relationship either before or after correction for age. Thus the imaginative factor in 
drawing skill is not sensibly associated with rapidity or steadiness of hand. 

With regard to craft there do appear to be significant associations, but they are clearly 
changing with growth of experience in maze-tracing. Uncorrected for age, there is no really 
significant association between good craft and steadiness of hand although the constancy of sign 
is to be noted. After correction for age it would appear probable that a small association exists 
—good craft having the steadier hand. But the relationship appears to be weakening with ex- 
perience and is hardly significant in the third maze. The same change makes itself manifest in the 
time taken. Those with good craft took the longer time in the first maze and there is quite a 


* Miss Dalgliesh reports that she judged of the craft capacity of her pupils quite apart from their 
age or technical ability. Thus given two children of 9 and 16 years of age whom she had rated with the 
same grade of craft capacity, the elder child would (if teachable) be doing the better drawing work, 
having had as a rule longer training. But this increased technical power was not regarded in the craft 
grading. 

t See second footnote, p. 174. 
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sensible degree of correlation; in the second maze the correlation is insensible, while in the third 
it has become negative. In other words in the third maze good craft has begun to tell. It would 
require far more material and prolonged experiment to be certain how far it is the hesitation of the 
good craftsman over a novel task (Maze I) or the greater difficulty of the third maze which has 
told in the favour of good craft in that case. All we can assert is that within the small range of our 
experiment the slight relation of good craft to steadiness of hand appears to decrease, while the 
relationship of good craft to rapidity of hand is only beginning to develop in the third experiment, 
and is then not of any substantial intensity. We have already seen that there is only a small 
association of good craft and time learnt, about enough to allow for the deterioration of craft 
with age. Hence the slight relationship suggested between good craft and rapidity of hand is not 
necessarily an argument in favour of such rapidity arising from training, it may well be the result 
of an association of the innate characters. 

(6) The remark at the end of the last section leads us directly to the problem of whether other 
qualities than those of draughtsmanship can be directly associated with steadiness and rapidity 
of hand. A priori we think there is much to be said for both mathematical and musical capacity 
being innate *. The former except in the case of geometrical drawing gives small training for the 
hand, but it does enable the owner of the capacity to realise more or less vividly a conception of 
the desired perfect maze description. On the other hand music not only gives much finger practice, 
but in the case of special ability probably signifies an inherited flexibility of hand. In our division 
of the material we have made only two classes—those of the students who possessed marked 
ability in music and in mathematics were separated into small classes from the remainder—the 
mediocre, the non-mathematical and the non-musical. We then applied the biserial method. 


TABLE VII. 
Association of Mathematical and Musical Capacity with Steadiness and 


Ramidity of Hand. 











Characteristics Maze I Maze II Maze ITI | 
| 
- | | 
Mathematical capacity and no. of bumps | — 112 + -139 — 216 + -136 — +136 + -138 
Musical capacity and no. of bumps | — -091 + -170 — -042 + -170 — -088 + -170 
Mathematical capacity and time taken ... | — -608 + -106 — +390 + -126 — 214+ +135 
| Musical capacity and time taken ... | — -303 + -162 — -233 + -165 — 029 + -170 





Mathematical capacity and no. of bumps | — -012 + [-090] . — -148 + [-088] | - -049 + [-090] 
for constant age 


Musical’ capacity and no. of bumps for | — -042 + [-090] , + -012 + [-090] | — -042 + [-090] 
constant age 

Mathematical capacity and time taken for | - -592 + [-059] | — -396 + [-076] | — -237 + [-085] 
constant age 

Musical capacity and time taken for con- | — -290 + [-083] | — -234 + [-085] | - -046 + [-090] 





stant age 





Now none of the correlations of mathematical or musical capacity with steadiness of hand are 
in themselves significant, but as all six of them are of the same sign, we may possibly assert a 
slender absolute relation between both and steadiness of hand above the average. On the other 
hand the rapidity of hand shows at first definite and in the case of mathematics marked relation- 
ship at first with both mathematical and musical capacity. But this relationship seems rapidly to 


* The correlation of mathematical capacity with age was + -175 + -137 and of musical capacity with 
age + -098 + 169, which are satisfactory as showing that the teachers really judged capacity and not 
knowledge; they are as far as they go also some evidence for mathematical and musical capacity also 
being innate characteristics. Direct evidence for the hereditary character of musical capacity may be 
found of course in the pedigree of the Bach family. It is less demonstrated in the case of mathematics, 
but the Gregories might be cited, and possibly one or two recent instances will occur to those familiar 
with the Cambridge Tripos Lists. 
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diminish with experience of maze-tracing. It is probable that the mathematicians had at first an 
advantage which was not maintained. It may be suspected that they realised better at the outset 
what was required; an appreciation also of time taken may have been a factor at the outset with 
the musicians. But both these advantages seem to diminish as the non-mathematical and non- 
musical gain experience. The above remarks refer to the total correlations; when corrected for 
age we see that the conclusions for steadiness of hand are confirmed; even fer constant age it has 
no sensible relationship to either mathematical or musical capacity. Rapidity of hand does show 
relationship with mathematics in a more marked, and with music in a less marked degree when we 
correct for age. But again both the associations are lessening with experience; for the third maze 
the superiority of the good mathematicians is only very moderate and the superiority of the good 
musicians has become insensible. There can be little doubt that the more marked superiority 
for Maze I was due to a better appreciation of what was needed in a novel task. 


(7) Conclusions. The material dealt with is admittedly slender, and was analysed only as a step 
towards more elaborate returns; it was made in order to determine what additional experiments 
should be tried. Our conclusions are therefore suggestive rather than dogmatic. We have not 
been able to associate in a marked degree rapidity and steadiness of hand as exhibited in maze- 
tracing with any training; we more than suspect them to be innate characteristics *. Good craft, 
mathematical ability and musical capacity seem to some not very marked extent associated with 
rapidity of hand, but it is noteworthy that even in these cases the advantage was rather an initial 
one and tended to weaken with experience. An apparently noteworthy point, which is well worth 
confirmation or contradiction, is that continued training may only just suffice to maintain a grade 
of efficiency, which deteriorates with age. It would be of much interest to demonstrate that 
training in some cases does not create or even develop a faculty, but maintains it at the higher 
range of efficiency which belongs to an earlier age. It is possible that the teacher cannot develop 
imagination in the later stages of youthful growth, but may be able to preserve the greater 
imagination of the child by proper training. Certain faculties may be most intense at certain 
stages of growth. If education seizes upon them at this age and maintains their then intensity, 
we may be apt to overlook their history, and suppose them created by the educational process. 
The point is worth a direct and more intensive investigation. Here it is only a suggestion. 


I have to thank Professor Pearson for his assistance during the preparation of this paper. 


II. Sur les moments de la fonction de corrélation normale de 
n variablest. 


Par SVERKER BERGSTROM, STOCKHOLM. 


1. La fonction de corrélation normale de n variables peut s’écrire d’aprés le théoréme célébre 

de M. Pearsont 
eS See 
Z= Soi ae en e 2B y=1 q=1 Fp% chibi A Ree ok een Bee (1), 
(23)? 1 TQ +++ Ty JR 

[* After thirty years’ experience in teaching in a drawing office I think it safe to say that within a 
fortnight it is possible to assert of the bulk of freshmen engineers whether or no they will be good 
draughtsmen at the end of their two to three years’ course. The power of rapid, steady, uniform bold 
work is there in germ or it is not. Knowledge of method and accuracy of result may be acquired, brt 
only to a minor extent can anyone acquire that which distinguishes a good from a mediocre draughts- 
man. K. P.] 

[+ The present paper reached the editor later than the three other memoirs dealing with allied topics 
published in this part of the Journal, but the methods adopted are of sufficient interest to justify its 
appearance in association with those papers. ] 


{ Voir K. Pearson, Phil. Trans. t. 187 (1896) et t. 200 (1903). 
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To 1 coevee Tan | 


oit R= 


Sas | Sak o<ecey 8 


et R,q est le complément algébrique ou le mineur de l’élément dans la ligne piéme et la colonne 
giéme, Les variables X,, sont mesurées a partir de ses valeurs moyennes et o, en sont les dispersions; 
Tq Aésigne le coefficient de corrélation entre X, et X,, d’ou 


Tpq = Tan» 


Top = 1. 


Le moment Payday... a), 


est, par définition, Eh [x,tx,0 win XP 2dX,dX, faa CMs oe Wat uaeneemaseartesad=serstehe 


C’est la valeur moyenne de XX," ... X,*. 


Pour simplifier Pécriture nous emploierons des coordonnées normales 


xX, 
Lp = = PTITTITTT ITT re ere te 
Introduisons, de plus, les notations 
R,,, 
1 
? JR (6) 


La fonction de corrélation devient 
Zo — bE ZApgtptq 
——e 


n 


(2m)? 


Notre but sera 4 calculer la valeur moyenne 


2= 


a, @ Oy 
MM ay "F_  is Se 
2. Considérons lintégrale 
a 
J= [ {fxs 1 xs? x, “zday data ... day. 


Il est facile de vérifier que cette intégrale est uniformément convergente, pourvu que 


| Ang |>k>0, 
k étant-un nombre fixe. 


En dérivant l’intégrale J par rapport a 4,,*, on aura 
ei @, @ a ag a. 
Mes 2 P+1 q+1 . a - ‘ 
J\= | ee [es ene a hey PED EEe ac ENy ic cscseesctverssl)s 


Done, si J est une fonction comme de a,,, on pourra obtenir, par une dérivation, J,. Il's’ensuit 
que notre question se raméne au calcul des dérivées par rapport aux coefficients a,, de Vintégrale 


I= iz [ zd de Woe Ua savetacteveudseceteeues sosks estveuee tents 


dont la valeur est, d’aprés M. Pearson, 


* T] faut se rappeler que d)g=4gp- 
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3. Introduisons maintenant les notations 


0 3 
| (g— - a) WD cick sendovoesee ekoucesceseesesonena 11), 
mn Oa a, Ody, », — 


rs) 0 0 0 
F’ r =(5—+a—)(g—ta—)? 
Py? Poy Ody, 0a, >, Cayo, 004, 


(k) i=k/ 90 0 
Pr. 9 Py_Qp? ***> P10, = I ge +) POTITITITi Te 
si t=1 \"Apa, Fain; 


Nous dirons que I’on calcule, par ce procédé, la dérivée symétrique de F. La définition s’étend 
au cas oll p = g, pourvu que l’on convienne que 


Ul oF 
F pp =2 0A pp . 
Dérivons ainsi successivement (8) et (9). 
, — se Z Ie , 
I ?,a,= M [%p, |= sp D DyQ, ettetesseesteeeeeneneeeenens (13), 
I” =-=M o Zo , , Za pD” 14 
at te ie bs [%p,¢, %p,a9] 4p MG BS ops "uM (14). 
On est conduit & poser, tout généralement, 
Zz k wk-1 
M [2p q, --+ Tp,a4] =o yp ee AZ) — (2k-3)2z +... 
(Qk — 8 1) 1n aia SEE mt Thc cesesees (15), 
ou, pour abréger, 
(29 + 110 = (Bp + Te — 3B) aS Tivvnncccecavecersosasseeeseee (16), 


et ot 2 désigne une somme de termes contenant i dérivées symétriques divisées par D et dont la 
somme des exposants symboliques est égale & k; il y a dans 2, autant de tels termes que l’on 


puisse grouper en 7 groupes k éléments, en ne tenant compte de l’ordre des groupes égaux. Ce 
nombre est 





=z — kh! (17) 
He Ria ae re , 

ot ay + ag +... | 
Writ Matin tose SUT. csuvivsseacestsesseascvecuceutewusuces (18). 

mM, +My, 5 act 


La formule (15) peut se vérifier par Ja démonstration par récurrence. Nous n’y insistons pas; 
elle ne présente aucune difficulté. 


4. Il faut considérer une dérivée partielle dont se composent, en premier lieu. les dérivées 
symétriques et puis les sommes 2h ; 


ak 
La dérivée oD 


PoIo o 





4 > > 
0a, ca - 04 
P44, 4.9). 


doit étre, évidemment, au signe prés peut-étre, égale au complément algébrique du terme 
aS ee ee 
Pa, “na, 4d 
Le signe s’obtient par l’évaluation du nombre des inversions dans p,, py, .-., Py—soit i—et 
dans q,, Ys, «++» Q—Soit 7’. 
Or, en vertu du théoréme bien connu sur la connexité entre un mineur d’un déterminant et 
celui de son réciproque, et 4 cause de l’identité 
1 
D=-., 
R 
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ce complément algébrique peut s’écrire 


ROee & 
a ae ae Re-k-1- DiPo---Pk 
Rr-* Di Py---DE R 


ou le déterminant au numérateur est formé par les éléments communs aux lignes p,, Po, ... Pes 


et aux colonnes q, qo, .-. gq, du déterminant R (2). 
xD RU KH 


D = (—])it+i _PiPa--- De (19). 
one (-1) 3 ) 


Ody a, ** Ody 0, 
On pourra d’ailleurs éviter la considération du nombre des inversions en substituant & 


7:d2++» Wk 
P\Po---Dk 


un déterminant dont la diagonale principale consiste des éléments 

T pd? "Pydys +++ Tpydy? 
et que nous désignerons en échangeant la lettre R contre Q. Permutons, en effet, dans le déter- 
minant Q des lignes ou des colonnes: on ne change que le signe. Les indices du déterminant R 


n’ayant pas d’inversion, on voit que 


192+» Up “ts Lite Re & 


P\D-. Dk D\Dz +++ Pk 
aD 1 ves 
On aura done enfin = ps 7 =p: are 7. wawgenchvds ovecsveneeetenes swe (20), 
‘ Gp ,4, ad c 4p 4, Py h 
| "pa, "ra, *** Tray 
| 4, 
P49. Mg °*° "PQ | 
Qit—te a) *' St FFI (21) 
ou ) =| cccccccccccecececcee [UUTESSESESES SH OH Re eeeeeeseseeeee 7 
D\P2-+- Dh 
Tra, "Ppa, °** "Pay 


Si deux des nombres p (ou des q) sont égaux, le déterminant (21) contiendra deux lignes 
(ou colonnes) égales: il s’évanouira; et, en effet, la dérivée seconde d’un déterminant par rapport 
a deux éléments appartenant A la méme ligne (ou & la méme colonne) est nulle. 

La dérivée symétrique du ime ordre consistera d’aprés (12) de 2* dérivées simples, telles 
que (20). On les trouvera en fuisant changer de place les p et les g &4 méme indice de toutes les 
maniéres possibles. En désignant ces changements par S, on aura 


(k) 1 gp pptide--- Ok 
ria a MMI RED seit cee octovcretaret vie (22). 


Il faut remarquer que le terme formé de la diagonale principale de (21) se reproduira une fois 
et une seule dans chaque dérivée simple; il y en aura 2 dans la dérivée (22). 


5. Revenons a l’expression (15). Elle va se simplifier. Tout d’abord 


zo = VD. 
Puis, considérons un terme dans ns soit par exemple 
p@™ Di”) D®™® 
57 nn | Raa) TE (23) 
ou N+ Ng t...+ =k. 

(n) 

Pa, Ia,» Pada,+++Pa,Va Wa,U,,+++ Ia 
D’aprés (22 si ioc, seth loins PO 14a, n\. 

_ D Q a Daye Pay 


C’est un polynéme en 7, homogéne et de l’ordre n. 
Cela étant, on voit aisément que l’éxpression (23), homogéne aussi, est de ordre 


M+ Nyt ...+ n=k. 
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Le terme 
y aura le coefficient Qmtnet...+ M4 — Ok, 


Donc le terme (24) obtiendra dans 7 le coefficient 2* oA : 


Généralement, en mettant dans expression (15), on voit qu’elle est une fonction homogéne 
de lordre k en r. 
La nature du probléme exige que cette fonction est aussi symétrique. 
Permutons, en effet, les indices du premier membre de (15); le résultat ne devant pas changer 
pour cela, le terme (24), y permutés les indices de la méme maniére, conserve le méme coefficient. 
6. Il ne nous faut donc considérer qu’un seul terme de (15), soit (24). 
Ce coefficient est 
(2k - 1) 11.9% - (2k -3) 112g? + (2k-5) 11 28 ge? 4 ako? gl 
k k 
i-k = 
= 3 (-1)-1 (2-214 1) 11 2-1gh Tt ......., (25). 
i=1 
Je dis que cette expression est égale a unité, quel que soit k. 
Je vais employer le raisonnement par récurrence. 
L’assertion étant évidente pour k = 1, je la suppose démontrée pour k, et nous allons voir 
qu’elle subsistera encore pour k + 1, c’est-d-dire que l’expression 
i=k+1 ae 
= (-1) (2k-214+3)!! ins Sa AE TS 
i=1 
sera égale 4 Punité. 
Envisageons d’abord les nombres g\ . 
Cherchons en supposant connu 9) - 
Ou Pélément (n + 1)i¢me doit étre placé dans l'un des i groupes formés de n éléments: il y en a 
Dn 
éventualités. 


Ou cet élément seul doit faire un groupe: les autres éléments formant alofs (i- 1) groupes; on 
aura ainsi 


i-1 
In 
éventualités. 

’ i a i-1 or 
D’ow la formule ol ak (27). 
En convenant que a — iS g° = & 

cette formule subsiste encore pour 
t=n+]1 
et += 1. 
L’expression (26) peut s’écrire maintenant, en vertu de (27), 
i=k+1 = =" 
SB (—1)1 (2k 21 +3) 11 B-1 (k- 6 +2) gh HF 4 ghey 
i=1 ’ 
i=k+1 ’ wa 
= 3 (=I) (Qk- 2643) 112-1 (b- 142) gh? 
i=2 
i=k eer 
+B (=)? (2-243) 11 Qi-1 gh 
i=1 
Changeant dans la premiére sonime # en (i + 1), on trouvera enfin pour (26) 
i-k _— 
B (-1)1 (Qk-2i +1) !12-1 {-2 (k-d41) + (2k-2i+3)} gh 
i=1 


Ce n’est autre chose que (25); donc la somme (26) est bien égale & Punité. c7Q.F.D. 
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7. En tenant compte de ceci nous écrivons pour (15) la formule générale 


M [%p,%p, ove Vekant Lex] = Srpq.p a, "Pasa, *** "vay Pe 


On obtient les indices des r dans cette somme en groupant les nombres p en k groupes deux a 
deux sans tenir compte de l’ordre ni des groupes ni des éléments d’un méme groupe. La somme 


(28) portera donc en tout 
(2k)! " 
termes, 
Si quelques p sont égaux, on doit les imaginer ate tés des indices; au résultat final on se 


rappelle que 


Soit, par exemple, Py = De = «-- = Pu. 
On retrouve la formule connue 


Ia = gre * de = (2k » OL 


8. Envisageons le cas général 
MM [z,” dy" «2. By). 
D’aprés la formule (28) cette valeur moyenne peut s’écrire 


i=pj=p e&™ 
oA, OT Wf ry cg ATS re fr eee SPREE (29), 
i=1j>% 


oui € “y sont les solutions du systéme 
C41 + Cg + vee + Cyn = 
Cyo + Cag + +. + Cap = Ag 


Cip + Cop + 0. + gp = 2 
ex devant, de plus, étre des nombres pairs. 
Considérons A,,. 
On doit ordonner de toutes les maniéres possibles a; éléments en p groupes contenant respec- 
tivement ¢,;, %;, -.-, €pj éléments, en tenant compte de l’ordre des groupes; il y en aura 
er. 


P P 
2 C15! Cog! 0+ Cy 
maniéres. 


Puis on doit accoupler e,; éléments avec autant d’éléments; c’est 
ei ! 
maniéres. 
Enfin on doit ordonner ¢;; éléments en $e, groupes deux & deux, en ne tenant pas compte de 
Pordre des groupes. On en aura 





my, (ey - 1) 
possibilités. 
D’ot, aprés des réductions, 
™ 
ie i=pi=p a;! Ti 
M [2 Xe 2 oe a, ?]= 8S af Il ,, Tay Oe eeeereeseesesseeess (31), 
i=14>i ei !! ey 


ot e;; sont les solutions du systéme (30) et od 
Cu lize, (e e;— 2) «ee 
O!!=1. 
Traitons comme application le cas p = 2. On voit que e,. est au plus égal au plus petit des nombres 
a, et ag, soit ay. IT fait 
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i ee a, a! _ az (a3—1).,, 0,-2 
a a oe aa 
os a! bit dg (a2 ~1) (a2 - 2) (a2 -3) a,—4 a6 
*G, —-=tei 411 Ti2 HF ivee — cevccveee (32). 


J’écris enfin les moments jusqu’au sixiéme ordre, en employant la notation 


- a 
Ba,a2...a»= M [a.™ X2 ee x,"P). 


B, = |S 
By = T12- 
8, = 3 


By, = Bis = 372, Boo = 277s. + 1; 
Bory = 22713 + To33 
Burry = T1234 + T1372 + Tra%es- 
Be = 15; 
Bs, = 15ry2. Bye = 12774. + 3, Bog = Gr xo + Ory93 
Barr = 12ryery3 + 3125, B30, = Gr? ,9r13 + Gryor23 + 3713, Boz = Sryo% 3723 + 2r?%yq + 27,3 + 2772 +15 
Barry = SryohysFig + Br 2% sq + Byg%2q + BP yg%e3, Boor = 2? y27 ag + AT y27is%o2s + 47274723 + Wish 
+ QWos%oy + M343 
Borsa = 272% ia% 45 + Wie yess + Wretassa + WysTia7e5 + WiasTis’ea + WasTis’es + Testes + Tea ss 
+ 251343 
Byyy111 = "12% sa%'s6 + 713724756 + --- (15 termes). 
Les moments jusqu’a ordre 4 & deux variables ont été calculés par M. Pearson et puis par 
M. Soper qui a donné une formule générale pour les moments 4 deux variables; Biometrika, t. 1x, 
1913, p. 101. (On doit remarquer cependant que sa formule (xxx) est atteinte d’une erreur 
typographique.) : 
Le cas de trois variables a été traité par M. Wicksell dans “‘The general characteristics of the 
frequency function of stellar movements, etc.” Lund 1915, p. 11. 
Enfin M. Isserlis a déduit notre formule (28) pour le cas 2* = 4 ou de quatre variables dans 
Biometrika, t. x1, 1916, p. 189. 


III. Formulae for determining the Mean Values of Products of Devia- 
tions of mixed Moment Coefficients in two to eight Variables in 
Samples taken from a limited Population. 


By L. ISSERLIS, D.Sc.* 


A. Let py, Py, denote the product moment coefficients, referred to a fixed origin in a sample 
of size n extracted out of a population of size N, there being four independent variables 2z,, 2», 
Z3, Z,. The mean value of p,. in many samples is P,,, the corresponding product moment 
coefficient for the sampled population. Let dp,, = p,. — Pj,, denote the deviation of the moment 
coefficient of the sample from its mean value, then 

Mean value of dp,». dp, in many samples 

= X (Diese — Pi2Psa)s 

N-n 
N-1’ 
and Pyo34 is the product moment coefficient with respect to the four variables. 


where x= 


(* Dr Isserlis sent me a paper containing the results of the present note with others accompanied 
by proofs in the course of 1916. It has been impossible to publish that paper so far, but it is only 
fair to him, having regard to the fact that other investigators are now entering this field, to publish 
his formulae in association with the memoirs printed in this part of Biometrika. Eprror.] 
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This result gives many particular cases if we identify two or more of the variables. For instance 
Mean value of dp,2 dp2= ( py%92 — p42 P22) a ; 


where p,2 is the second moment coefficient with regard to 2, i.e. yz’, in the usual notation for one 
variable and so forth. Also 


Mean value of (dp,:)* = Xp — p,*). 


B. Similarly if there are 6 independent variables x,, 2, ... Xs, X. 
The mean value of dp,> dp, dpsg 


=a {Presase — PresaPss — PsasePi2 — PresePea + 2P12 Psa Ps6}> 
, N - Qn 


where X= WI 


Particular cases are 


Mean value of (dp,2)? = % (pys — 3744 Py2 + 2p425), 


or Mean value of (dy’,)? = ses (H’g — 3p’gu’s + 2y’,*), 


and Mean value of dp,.dp.3dps, =” [2122232 — Pr2%s Psi — Pos: Piz — Psi%2 Pog + 2712 Pos Pai): 


C. Let there be 8 independent variables x,, 2,, ... %, %,. 
The mean value of dp. dpog dpsg Apr, 
- 5 [Press Pse7s + Prose Psazs + Piz7e Passe — Prsea P56 Pra — Pse7s Piz Psa 
— Pi256 Psa P73 — Psa7s P12 Ps — Piers Psa Pss ~ Psass Piz Prg + 3P12 Poe Pse Prs] 
x?’ 


+ — [ Piesas67s + Press Psezs + Pi2se Psazs + Pie7s Psase — Prosase Pos — P\23478 Ps 


— Py25678 Psa — Psase7s Pred 


where pb = 2 - 4y’ + By” + 4y’/n - 6y”/n, 
e _ -— 1+ 3y’ - 2y” - 3y’/n + 4y”/n, 
and x” = (N - 3n)/((N - 3). 


When the sampled population is infinitely great 
$= x= n= x" =1, b= 1-2 
xP" 


Mean value of (dp,)'=*P (3p, — Gps pys* + 3p,24) + “aa (pys + 3pys? — 47,6 p,2), 


As a particular case, 


or in the notation usual for a singie variable, 


Mean value of (dy’.)4 = x¢ (3py’,? — 6y’yp’.” + 3y’,*) 4 


xp’ Par 
n3 (y's + Bp’y? - 4y’ap’s). 

When the sampled population is normal the results of (A), (B) and (C) can be immediately ex- 
pressed in terms of the correlation coefficients rj., ry;, ... 773, by means of the formulae established 
on p. 138 in the current issue of this journal. 
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ON THE MATHEMATICAL EXPECTATION OF THE 
MOMENTS OF FREQUENCY DISTRIBUTIONS. 


By PROFESSOR AL. A. TCHOUPROFF of Petrograd. 


CHAPTER III 


I 
(1) Let us put 
N 
Ex bad [X; —Xiy)\ = Vy, (N)ssceerecesscceseeeccvessene (1). 
We have V4, (w) = 0. 
Noting that E (X; = Xwy = E [X; sd X wy) Te 
we find vr) = E[X;-X wl = ELX,- Xf. 


Replacing X,— X iy, by (X,—m,)—(X wy) — m)), we find 
Vy, (N) = Me + = (— 1) 0, E(X, — mys (Xe) — ma) + (— 1Y wy, => 


But [X(w) — m) = ~ [(X,—m,) +(N — 1)(X w-1 — m)¥, 
h x L$ z 
waere ete £5 Pes 
Hence 


¢—-1 


ae 
Vr, (N) = Be + 4 OF 2% OPN ~ 1) Men Hn, cw—n) + (— VY be, cnn 


Se = 





ames Ti Cd bet 3 be 1y CP Cf Myr Mh, (N-1) 


+(-1) 7; re |e +3 Ch (N —1)* bea Mn, (w-1) + (N — 1S my, wh (2) 


WN a 1\" r-2 
"> ( ¥) \b 7 Rod (—1)* OC," py—n br, w-1) + (— 1) Hr, v-» 


: 
a 2, (— 1) Cp pra Mh, (N-1) 





On the other hand, replacing X, — X y) by 
, N-1 
> (WN - 1) X,- (0-1) Xw-y] = [Xi - Keven) 
we find: F 
Vy, (N) = (7) >» (- 1y¥ C,* Myp—h Mp, (N=-1)  ceeeereeeseress 
h=0 


Biometrika xu 








186 Expectation of Moments of Frequency Distributions 


Replacing in (2) quantities of type a,(w-1) by their values in terms of the 
quantities w (cf. Chapter I; formulae (15), (17) and (18)) we obtain: 


N-1\" r-l A-1 1 i : 
rnn= (iy) [eS OF Mann Sap apie 2 (- WY Bes Torrey 


2 que S 1 i : 

= op | Mer-th—1 = (W—1y"" pA (—1) Bring Taras, rin (4), 
-1 l 

~ 


; ) 
a = (W—1y* By (— 1) Bying,s T: ane itif 


N-1\"" 
Vor+i, (N) = ( “7 ) a 














1 é ; s 
+ = Ones Peop+i—2h 4 Vope = (—1) Bringj Ta n-isj 
iii i +t itl, 
~' oe Mop—2h . | : > (—1Y Br—ing,j Toner, nin 
Pee 2r+1 i=0 (V-1) 1+ j=0 Is ’ J 
S 1 t 
% may ©, (HD Press Toren} | | 
and, hence, | 
1 
Vy, (N) = he — yl = $ri-) Mr-2 Ho] 
(6). 
1 = 
*™/ Ee fp ON pty — 3 pp fat A us| 4 
As N increases the ratio “ tends in this manner to the limit 1. 
v 
(2) Putting r= 2, 3, 4, 5, 6, in'(2) we find without difficulty 
pi eee SS 
2, (NY) = N Ha Pa 77 Ba 
(N —1)(N-2) ice 
V3, (N) = WN? Tiere 
Ys, (N) = Ms — 7 + [2m — 3p!) + 7: 5 [2m 5 pn?) — 75 5 [Hs — 3." )* 
era 10 10 
YW) = bes — 7 [Ms — 2s ba) + 73 [os — 5s Mo] — ara Los — Sits He) 
(7). 


— ~v 


4 
+ 57 [Hs — 10 p15 12] 
on 3 5 2 3 
¥6, (w= Me — 57 [26 — 5 pts Ma] + 72 LBs — 15 pa ba — Spa! + 9p," 


-¥i 5 [4p — B38 p4j be — 165? + 42y,*] +i S [8m— 36pm, Me? — 22," + 63y,"] | 





5 
— Ps [os = 15 ps fe — 10p,* + 30p,"] 


* For footnote see page 187. 
























Au. A. ToHouUPpROFF 


(3) If the law of distribution ofthe values of X satisfies the condition : 
4=0, ¢=0,1, 2,...0 
then, as we saw above, puiz:.~)=0, 7=0, 1, 2,...0 and, as appears from (5), 
Veiss,(y) = 0, 7 = 0, 1, 2,... 00. When the law of distribution of the values. of X is 
Gaussian and pj =1.3.5... ‘wks 1) p,.', t=1, 2,...0, then, as we have seen 


Pain) = 1.3.5...(21—1) ba! =>; and, consequently, 











—1\* p : 
V2, (N) = (=) = C Ses 2h Poh, (N—1) 
N-1\" 4 5, 1.3.5.. . (20— 2h —1)1.3.5...(2h—1) 
= oe’ C3 e 
( N A oe Lies 1) a (8) 
N-1\* bs ; 
= +" 2, Pai = (N — 1) ai, exp | 
| II 
(1) Let us put 
, 2:3 
Yr) — W 2 [X:- Xm] 
ie eh Nes OR Ne Pe et (9). 
N m 
Wea 12 [X:-X ont’ | 
We have: i 
bow © 0) 
Ev’, an = Pn) = NV Wea | 
1 m 
Ely’, wl" = ya mn sts (10). 
| 
, yh (m—h) - | 
E[Y an — Yan)” = z ( 1) Fak Om We, wy san | 


(2) When m= 2 we find: 
N 2 
7. N) = E} 2 [X; ed fa: 
N N vos EE), 
=E 2% [Xi— Xan + FZ et sag 
i=l inl j+i 
= Noe, ay + N(N - 1) E[X,- Xwl [X.- Xm!’ 


* The quantity v4 () can also be written in the form 


(N-1)(N-2 


) N-1)(N-3 an. a(N- Dy 
14.) —— Lali (44 — Sez +3 


R. Henderson (‘‘Frequency Curves and Moments,” J. I. A., 1907), ni giving correctly the values 
Ha, (N)> #3,(N)> #4,(N)> ¥2,(N) aNd v3, (vy), erroneously gives (p. 435) : 
N-1)(N-2 (N.-1) (N- 3) 
n= ss Y g- dh (44 — 3p9%). 
-1) 






fg? 


The true value of v4,(y) exceeds the value obtained by Henderson by i Ha". 





N2 
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5 1 4 
Putting Xw-» = Fo = Xi, 
— 4 iss 
we have 
1 
[4X1 — X(wy)) [Xe — Xa) = 7 {((W - 1) (4a — m) - (4, ~ m) . 


-(V- 2)(X (w-2)— m,)] (VW -1)(X, - m,) — (X,—m,) — (N — 2) (X (-2) m,)]} 


=f(X, X= (77) hi — ma) + (Kem) X ory ma] + (APY Xena, 
where 


SF (Xi, X2) =[X1 — m,] LX, -— mJ — Be hy a, — m,) + (X,—m)}. 


Hence: 
E[(X,-— Xl [X2.-X wl 


=E(f(X,, X)y+ = (- 1 ChE( f(X,,X,)}""* ( 





N —2\* 

r) 
x {((X, — m) + (X,— m) | [X ww-2) = m, | = [X (w-2) = m,]}*}* 
N —2\? . 

=F (f(X1. Xa) +r (=) 2, (N-2) E {f(X, X,)}"" 


+2 3 ctor SEY wear Bf (XQ) [(Xi—my)+(X.—m) PP. 
h=2 k=h 4 


But (see Chapter I (15) and (16)) 

Ent. (2)-1 1 
> 
i=0 (N- gyknt. (5) +i j=0 
. N-1)/,, 

E{f(X,, X;)}* => 2 (- iy ¢ Nit y Cc, C3, Mons—9 Bs—f+g> 


E {f(X;, X,)}** (xX, —m,) + (X,—- m,) P*-* 





Pk, (N-2) = 


xe 1Y Bane, (2) -t44.5%, Ent. (E)-i+3" 








s—h 2h+2f—k N-1) 
ge * (-1 07_ 1 Cnsy-s Si  _ Ms+htf—k—g Pe—h—f+g- 
Hence: 
r 7 r N- 1 
N(N-1)E [X,- x wy] [X, -Xm)= == 2 (=i ye — C Cop Mrss-9 Mr—f+g 
oa B HX (ry NEDA WH or 
+ U, Ma 2 (- 1) Nit C. 1 Coy Mraisfeg Pr-i-f+g 


* The development of E { f (X; X2)}"-* [(X; — m,) + (X2—- m,)]}4-* in powers of N does not contain 
any terms of higher degree than N° while at the same time the development of My (W-9) Contains no 


i 
ry To obtain Ev, (Ny ~¥r, (MF correct to 1/N* it is con- | 
N nt. a 


sequently sufficient to carry our calculations as far as k =2t. 


terms of higher degree than 
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ra? ote N-1Y"(N-2 
+ Cf pw, Zz s. (-1yY “alle Cis ( aa has Me-2-f+9 


> Set Sa 


k 

er Ent. (5)-1 é 

k=3 i=0  j=0,_ pip REL S=0 90 
i 


=" "3 wis Hg e 1) co ls ‘oid 


r-h ~ 2h+2f-k 


(NW — 1)" (WV — 2)%~ Bat. (FF)-i 
x Ng “Bant, (3)-s +5,J T, Bat. (5) -i+ Preh+f—k—g Pr—h—JS+g 





= N? po? — N (2 + 203 (orgs ra + be) — Of wpa le — 207? [Me pps + Hea] Ha} 
+ { CP [Apes Mra + 4u,7] + CP [2pp42 Meme + Bors Myr + Gy," 
— BC, pip la 2C O's [Mp bea + Mpa] be 
— 2C2 OW. [Meta Mean + Ape ye + 3p? p1] fe — 1407? pe pe Fe 
— 14C,? wy fe — 407 Mreoi Pr—2 Bs 
— 20,3 [Hy ys + Bpy—r M2] Hs + 3C,? yw,» pe? + 1803 [ora Mens + fpna] os” 
+ 60,8 [py Mpa + Sporn Mrs + 3p*,-2] H2*} +... 


Substituting in (1)) and replacing v.,,(y) in (11) by its value in terms of the 
u's by formula (4), we find after reduction : 
fae = Nye + WN {pee — (20 +1) py? — 20 py Mya 
+7? pp Me +7 (1 — 1) fy Myo He} 
s |2r a — 9 (27-1) pope Me — 49? ppg Mpa — 7(37 + 1) we? — (7 — 1) pgs Mee 


+ Br? wy Met 7 (7 — 1) (40 +1) py hye Me + ppg Mrs Me -(12), 


+ (r = 1) Br-i Pr—-2 Ms 
yl-3] c 
+ a Mee Bers bs — B98 (r — 1) yn oa? — 9° (7 — 1) (7 — 2) fp Ms Me 





ri-4] 
= Op He brant ae eee 


and hence 


1 we) 2 ) 
E[v’,, oe) — oP = 72 Way — Po 


1 a 
“NV {Moe — Mig? — 29° pyr fpr + 1° 7p fe} 


ar Wi {2r Por — 7 (21 — 1) pops Ha — 7 (7 — 1) Heys Mypoe — 47° Prt Bro + (13). 
+ 79) pps Pps fe 
— P(r +2) pP+2 (r+ 1) 7 (7-1) py oye Mot 39? wn oe + °(7—1) pan Meme Ms 


2 =} 2 
—r(r —1) (7 — 2) py py —s pe? — nO pean er. 
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(3) When x =8 and n=4, we have similarly: 
Wren = No, + 8N (N= 1) B[X, ~ XP [Xe Xen 
+NN-1)(N-2)2[X,-Xml(Hi—- XwVs-Xwl, | 
Wy) = Nour, (y) + 4N (N - 1) ELX, - Xan [Xi-Xey¥ | 
+ 38N(N—- 1) E[X,—Xw)]}”(X.- Xw~]” | 
+ 6N(N —1)(N-2) B[X, — X~y)}” [X.— XV [Xi-— XY’ 
+N(N=-1)(N- 2)(N—3) BEX, — XV [X2— XY [Xs — X we} [Xi - XwF: 


Determining ee iy) and wt am from these relations and substituting the 
values found in (10), we obtain E£ [v’,,(w) — v,,(~) and E [v’,, ~~) — vy, a]. Their 
expression exhibits no special difficulties, but is so unwieldy that I do not 
give them here, contenting myself with the deduction of E [v’,, () — v,,(y)}* and 
E[v’,,a — Yr, which will be shown below (see Chapter IV, § 111). 


Ill 
(1) Noting that 








N 
> [Xi-AXwP= = (X;—m,)—N [Xi — m4 F= z (Xi- my = (X;- -m))- 
i=1 
and putting 
(r, 8) rN Ts 8 
U. ‘yy = Ei > (X;- m,)| | 2 (X;- m)| 
i=1 i=1 
a ae ) net (14), 
(r, 8) , 
Z, (N) — zl & (X;- Xu) | = (X;- m,)| 
we find 
(m) 4 ‘a \ 
Wein =F j = (X;- Xen} 
(m) m—1 1 (m- “h, 2h) siete 
=V, (N) +2 (- 1) rh o as (N) +(- 1) N' Ham, (N) 
N (15), 
Verun = | 2 (Xi— m| 
m 1 m 
oe W;’ ty = cd wt 2s, Ang ™ + N™ pam, (x) } 
m) ™m 1 m—h, Qh sa 
or Woon Veen - = On WK agi ve Fae a a (16), 


(r, 8) —, a 1 (r—h, +2h) os 
Oem $i = 9, Ne Z, (N) + NT porss,(n): 


r, 8 v2 hl met 
2M) = 2 (- 1) Or a Oun et) (= 1 NH pars, cy» 
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whence: 


"SOF alacant + (1 USSR) + pangs an [1 + (- 11 =0...(07). 


(2) Noting that 
r N 8 
god =F [x — mm) + = (X;- m)| [x —m)+ = (X;- m)| 


é—1 : s 
= Pores + =~ C; Morss—j (WV — 1) py,, (N=1) + Mar (WV — 1) Ms, (N-1) 


rl 


s (h, 9) 
+3 0 Por+s—2h sy wt, = = c c? Por+s—2h—j U8 (N-1) 


3 (h, (r) Ss 7 (7,5) (r, 8) 
bi 20 a U, 2. 1) + Bs Vs w-1) + = C Ms-j U.. (N-1) + U. (N=1)’ 
we find: 
(r, ) ( ’ ) = j 
Uy a7 A -1)— ”* OF Parse) (¥- 1) #j,(N=1) + = 0 Mor+s—2h ee (N-1) 


hj (h, 8) 
+3 Be CO tenagten yt: = Wants Com 
J 


and hence: 
ma £ 4 “ z fii u vy) 
U 2, an C, Marsenj > (V-fy Hy N-f) + a Pat 2,(N-f) 
rl N és N (h, 3) 
+2 COC pna t Tran = = OC, Cains & Von 
h=1 fe=l : f=1 
Putting here (see Chapter I (14) and Chapter II (10)) 


Ent. (2)-1 yl ()- a 


N . 
2 (N-f¥mw-n= aa Tj, unt. (Z)-i, 
hy ~_ Ent. (3)-« —i+1 
2) 
3 yo _ & NEM wy 





2 ix. t+1 si 
and noting that 


(0, 8) 
Us (Nn) = N* Bs,(N)> 


ue 0) =v 


2,(N) — 2, (NV)? 
UN? = N SC pese-n (N 1) 
2 anon Meise y= "9 Pat2-h Ph, (N-1) 


Ent.(5) 

2 2 4 

= N per + > Nev > C, Hsi2—h Th, js 
g=1 ° h=2j 





Vol. v1, p. 3), but unfortunately misprints have crept into the formula. 
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we find in, consequence : 


(1, 1) 
Us. any = Nes, 


(1, 2) = 
U3 iy = Neat NO pe, 
Ue = Nust+ 4N 

2, (N) — Ms oe Bs Pe» 


Ten = Nye t+ TN py pp t+4NE4 pe? + 3NO4 p38, 


: pad = Nyy + NO) [16 py pre + 26 p15 wy + 15,2] 
+ NO [60 p, pa” + 705? wo] + 15 NI p,!, 


@, 1) = 
U, in Ny; + 2N pp, 
Usa = Ny. + 3N™) pw, wp + 2NO pe + NO pw’, 


ahd = Nyugt+ NO [8p wo + 125 ws + Tye] 
+ Hi (16, pe? + 20," He] + 3NI-4 po', 
3, 2) 
US Lf = Nyy + N'A [41g pty + 6s pp + 3,7] 
+ Nt) [64 be? + Cus? bs] + NI-4 ps'. 


(3) Substituting the values we have found for di in (15) we obtain 
Ween = (N= 1) pe 
WO y= mg (N= 1) a + (N20 +8) | ...(19), 
= Wud + N [as — 8] — [2s Bt) +7 Woe Sa | 





ri N-1 Y yr 
Won = aye (= Uae + 8 (W = 1) (WP 2N + 5) pa a 


4 


—2(3N?—6N +5) w,?+(N —2)(N*—3N* + 9N — 15) p35} 


= N* pS + N? (8p, w? — 6ys*) + N [pe — 1244 oe — Gy,” + 20°] + (20), 


1 ot ae 
— [8 p45 — 30, po — 18 ,? + 48°] + V [Bug — 36 py po — 22,7 + 63y,"] | 


4 


A! . 
~ V2 [Hs — 15 pi, pw, — 10," = 30u,°] 





/ 


(2) 


* The value of ws W, i) 


is given in Student’s paper ‘‘The Probable Error of a Mean” (Biometrika, 
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N-1 
Wy = Ng (av — 1pm 8-1 P=) ae 


—(N—1)(24N?—48N +56) ps ps 


| +6(N —2)(N*—4N* + 16N?—40N + 35) py ws? 
— (N — 2)(24.N* — 120N* + 280N — 280) p,2 py 


| +(3N*—12N* + 42N?- 60N +35) pe 


+(N —2)(N —3)(N*—4N* + 18N?— 60N + 105) pst} 
| = NV pot + N* (6p, 2 — 10p,4] 
| +N [4pte plo + Spec? — 42 pug pea? — 24," pa + 53ys"] | (21), 
| + N [uty — 20p1g by — 15 pr? — 24pes py + 180 pe, pe? + 192pi.2 pry — 218.4] 
— [Apts — 6 pte po — B4pr2 — 96 pg py + 576 p24 pa? + 688 p25? pty — 687 115] 
+2, [6 ta — 112p pa — 10242 — 176 pp 
+ 11221, ps? + 1360 p2,? pp — 1398p] 


1 
— 7 [4g — 92 p46 fe — 95? — 1605 fs 
+ 1110p, wo? + 1400p? we. — 1515 p,*] 





1 F 
+ 75 [He — 28 pe Ha — BB poe? — 56 pts fg + 420 p14 ps? + 5603" Ha — 6305") 


In the case when the law of distribution of the variable X follows the Gauss- 





Laplace law: 
Waa =(N-1) m= Vw» 
fe =(N-1)(N +1) p= Fe ek q (22) 
Wea =(N-1)(N+1)(V +38) m= Vian 
Wy =(N—1)(N +1)(N+8)(N + 8) ust= Vey ® 


mae 2 3)  : 
(4) Substituting the values found for Wien Wen ws iw) and ae in 


(10), we find : 
@-ly , _@-)e-> 


’ eo = 2 
E{v's, (x) — ¥2, @] ye ye bs 





1 4 1 
= WV [ps = ps?) = WN? [us = 2p") + WN: [Ms a 3 p27] 


* Of. Student, ‘‘The Probable Error of a Mean” (Biometrika, Vol. v1). 
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° -1 3(N-1¥(N-5 
E [s,.0n- anh =A, | = mas sb 


2(N —1)(8N?-6N4+5 2(N—1)(N?-12N +15 
1a ner pt 3 DUN = 19 +18), 











(24), 


1 1 
= py lMe — B pte ba — 6s! + Qos") — ay [Bp — 21 ps be — 18 p4? + 260") 





1 1 : 
+ 774 [Bp — 83 p44 a — 22 p15? + 5p?) — 75 [He — 15 ps pe — 10 pg? + 300" 
Ev’ 3 f 2)2 
[>’2, ay — », an} = yi Les — He] 
1 
+ Fy Los — Spe Ha — 15? — 24s os + 4B ps fa? + D6 ps? to — 303") 


1 
— Gp LAs — 40 pe pa — Shpes? — 96 ps bs + B36 4 pa? + 528 pq" po — 306 H2'] 
> (25). 





1 
+7 [6 pls — D6 pg pte — 1022 — 176 py + 924 peg pg? + 1232.7 pp — 104464] 


“ 7 [pte — 92ptg ty — 9B pte? — 16 pl fog + 1110 pry pin? + 1400 ie? poy — 1515p] 





1 
+7 [peg —~ 2B pg pg — BH yu.? — SG p05 pry + 420 p14 pe® + 560 p,? wo — 630u5"] 
In the case when the law of distribution of values of X follows the Gauss- 
Laplace law : 
, 2 on 
E[v's, ery — ¥2, (ry = — pa bo 


’ 8 hay 
E[v's, «vy — ¥2, ay P = arg" Has 


3 12(N-—1)(N +8 
E{v's, «wy — v2,cy ft = C > Fe au! 
Elsa — an _ 3 N+38 
{E (v's, — v2,anP?f = N-1° 








CHAPTER IV 
I 


(1) We may also follow another road, to deduce the formulae obtained above, 
a road nearer to the one usual in English literature. 


Let us denote by n; the number of times in NV experiments the variable X 
takes the value &;, one of its & possible values (cf. above, Chapter I, § 1), and putting 


, ny 
Pj =H 





Au. A. ToHOUPROFF 





k 
we have: N = 2 nj, 
j=l 
k 
>> Pi, = uy 
j=l 
k 
Mm, ~> 5 pybrs m'= 3 pj &, 
. , , 
br whet Pi (Ej — mi)" s Mr - 3p, [& —m,]’, 
Xx) = 2 Ae) ‘&= 
But 
En; a pi 


in? = Np; + Np? = N*p? + Np; (1 — pi) 
En = Np; +3Nt p? + NW p? 
= N*pi+3N*p? (1 —pi) + Npi(1 — pi) (1 — 2p.) + (1), 
Ens = Np: +7NWp2 + 6NC4 p2 + NRA ps 
| = N*pi + 6N%p3 (1 —p;) + N* p2-( — pi)(7 — 11p,) 
+ Np; (1 — pi) (1 — 6p; + 6p?) } 








and in general, as is not difficult to see*, 


Eng = & NC a, pit = & ad ate 2 (—1) a,, ase Brosg Po eeceee (2). 


Further, denoting by P, the probability of n; taking. the value h, and by 
EY, the conditional mathematical expectation of n; on the assumption that n; 
takes the value A, we find: 


P, = OC. pi (1 —p), 


(h) Pj_ 
E,, =(N- h)y= me’ 


Enjn; = 3 PyhE® = 3 (N—h)hC'yp? (1 — pi) p; = N(N-1) pp. 


Similarly we obtain : 


En,n,ni, =N(N-1)(N—2) pe papi, 
En;, ni, oe Ny = N(N- 1) .(N-k+ 1) Di, Pig see Diz 


* See my paper, ‘‘ On the Mathematical Expectation of a Positive Integral Power of the Difference 
between the Frequency.and the Probability of an Event,” in the Proceedings of the Petrograd Poly- 
technic Institute. 
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Enin, =N | p,p; + NM pep; 
= N° pi pj + N* pip; (1 — 3p;) — Npip; (1 — 2p) 
En3n? = N') p; pj + Nt) p; pj (pit pj) + NI p? p3 
= N*p? pj + N*pip;(pit pi— 6 pipj) + N*pi pj(1 — 3p;—3p; + 1 1pipj) 
— Npip; (1 — 2p; — 2p; + 6p:p;) 
Enin = Np. pj + 3N™ p? pj + NU ps p; + (4), 
= N* p? pj + 3N* p? pj (1 — 2p;) 
+ N* pip; (1 — 9p; + 11p?) — Np; p; (1 — 6p; + 6p?) 
En? nj ny = NU pipjpr+ NU p? pj pr 





| 

= N*p? pj pr + N* pi pj pn (1 — 6p;) — N? p; pj pp, (3 — 11p,) | 
+ 2N pp; pr (1 — 3p;) } | 
| 





ry r3 
nnh= —(h,+h Ai yh 
Eng nfm 2 Maile beens a MEE LT ern (5), 
_ . 


and in the general case : 


rr T2 rk 
, . he —(hy thet... th : 
En,” As head 2, + 2 MW Vithetn tA, hs Cys, hy ++ Opp, hp Dig” De”... Digit 
1= h. = he = 


If the numbers 7;,, n;,,... Ni,, referred to k series of independent experiments, 
then we should have: 


En,” 1,7... nyke= Hn” En,” ... Enj," 


Tk 

—h —h, ~h 2 

ey 2 NUM NAD... NUM tty, by Orgs hy «++ Orgs hy Pty” Peg... page. 
“A “ 


(2) Passing from the mathematical expectations of the numbers of repetitions 
to the mathematical expectations of frequencies, we find : 


Epi =pi 
‘9 2 1 
Epi? =p? + + pi(l — pi) 
vn /8 3 3 2 1 
Ep;* = p? + x pi (1 — pi) + 75 Pi (1 — pi) 1 — 2pi) 7) 
"4 4 6 3 1 2 
Epi = pit + xy pi (1 — 6pi) + 75 P? (1 — pi) (7 — pi) 





1 
+ 773 Pi (1 — pi) (1 — 6p; + 6p?) 


P =. 2 h 
Ep; r= > Nh pi = (- Ly ay, r-hof Be-ney,s Pi? ecvcccccccccecs (8), 
h=0 S=0 
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’ , 1 
Epi py = = PiPj— FPP 
Media ge 1 
Pi? Pi = = pi Pj + Fy PiVs(1 — 3pi) — 77a PiPi (1 — 2pr) 
ee 3 2 
Epi pj Pi = Pi Pj Pr— Fy Pi Pj Pr + 773 Pi Pi Pr 
, , ©. 1 
Epi* pj = pi p+ app? PM) (1 — 2pi) + Fravips (1 — Opi + Lp?) 


1 
— 77a Pi Ps (1 — 6p; + 6p?) 








A a 1 1 f (9), 
Epi*p;* = pip? + 7 Pipi (Pit Bs — Spi ps) + 
1 
x pip; (1 — 8p; — 3p; + 1p; p5) — 775 Pi Pj (1 — Api — 2p; + Epi pj) 
eae ae wee 1 1 
Epi” pj Px = pe Pj Pr + Fy Pi Pj Pa (1 — Opi) — 775 PiPs Pa (3 — 11 ps) 
2 
+ 773 Pi Pj Pa (1 — 8p%) 
6 11 6 
Ep; PjPn Py = Pi Pj Pr Pg — Pi Pi Ph Ps + Fa Pi Pi Ph Ps — Hyg Pi Pi Ph Ps 
and hence: 
E (pi — pi)=0 ) 


| 
- 1 
E (pi — pi = wri tl — pi) | 
| 
; 4: 
E (pi — pi) = 73 Pi (1 — pi) (1 — 2p) | 
RG) —p)= 2070 - a4 a Ltn+Oe) | 
(pi — pi) = jy Pi ( — pi) + 775 Pi ( — pi) (L — 6p; + 6p?) | 
, , 1 
E (pi — pi) (pi — pi) = — PsP | 
7 / , 1 
E (pi — pi? (pj — pj) = — 73 Pi Dy (1 — 2ps) | 
Pgs etka aie | 
E (pi — pi) (Pi — Bj) (Px — Pr) = F732 Pi Ps Ph | 
| 
0 eee: oe ra Kae! > ; 
E (pi — pi) (pj — pi) = — Fp P? Ps (1 — Pi) — Fra PLDs (1 — 6 pit Op?) (10) 


’ ; 1 
E (pi — pi)? (pj — pj = yi Pipi — pi-— Bi + 3pi Pj) 





1 
— sya Pi i (L — 2pi — 2p; + 6p: D)) 
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U , ‘ 1 
E (pi — pi)? (pj — pj) (Pr’ — Pa) = — 575 Pi Pi Pn (1 — 3pi) 
2 
+ 77a Pi Pj Pa (1 — 3p%) 


E (pi — pi) (py — Bi) (Pa’ — Pa) (pf — Py) = iP PsPaPy 





6 * 
~ ya Pi Pi Pa Py 
In the general case we find: 


r-1 
E(p; - py = * W2 = (—1¥ pf 7 V"«, Oy, rrk—f Br+k-s,k »+-(11), 
f=Ent. (es) 


and hencet 
E (pi — pi =1.3.5...2r—1) {repr — py 


+ ya a- pi) a _) [(2r— 1)- 2p; (1 me pi) (49r + 1)] 


. (12), 





es sol ae Ee. ; 
+ apr Pi (1 = pi ——gr59 — [(20r* = 60r* + 31r + 15) 


- 4p; (1 — p;) (40r* — 3077 -—r + 3) 
+ 4p2(1 — p,)? (80r? + 1207? + Tr —21)] + vf 


, 1 il 
E (pi — py =1.38.5...(2r +1) (1—2p,) {en De (pd 


r(r— 
810 


+ yrs (1 — prt SO C=) (cage — 8r2 — Tr + 147 — 54) 





Pp ” i 


+ yap (1- "Dior 5r—38)—2p.(1 —p,) (20 +35r+12)] 


r (13), 





— 4p; (1 — p,) (56r+ + 42r — 77r* — 42r + $) 
+ 4p? (1 — p,)? (1124 + 5049? + 6657? + 189r — 72)] + sid 





* The expressions for E (p,'-p,)*, E (p; -p,)? (pj —p;), and so on, show how dangerous it is to 


reject, without due qualification, terms containing x to higher powers. Depending on the magnitude 


of p,;, E (p, — p,)* may be either greater or less than — wae (1—p,)®, according as p; (1 -p,) S 4; when p; 


is very small—of order 1/N—the term rejected and Fe term retained are of the same order. 

+ Cf. my paper, previously cited, ‘‘On the Mathematical Expectation of a Positive Integral Power 
of the Difference between the Frequency and the Probability of an Event.” Both formulae may easily 
be obtained directly from formulae (22) and (23) of Chapter I. 
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E (pi — pi’ =(1 — 2pi) he pi (1 — pi + y(l — pi) {1 —12p, + 12pe} 

 E (pi —pif = FyaP? (1 —po) + jae (1 — po" [5 — 26p:(1 —pd)] 
+ ya Pe (1 po) [1 — 80p, (1 — pa) + 120p2 (1 — po] f (14) 


F 105 1 
E (pi —p,=(1 — 2p) far pi (1 — pi) + 775 pe (1 — pi? [56 — 462p; (1 — pi)] 





+ yh (1 —pd)[1 —60p.(1 —p) + 360p2(1 - poy} 


ee es 
Replacing Ni-(s+4)] in (5) by 2% (—1)' Baysac Wt", we find after some 
1=0 


| transformations: 
| F re mitral | h h —h; 
Epi -_— PA NI 2 z(— 1 ae Ory, ry—hy Arg, rahe Bey try—ty he, fy hg le! by 


where the summation for h, extends to all positive integer values from 0 to the 
smaller of the numbers f and r, — 1, and the summation for A, to all integer values 
from 0 to the smaller of the numbers r,—1 and f— hy. 


Substituting the values of Ep;"—4 p/:-4 in the development of 
aS *# 
E (pi — py (pj -— Bi" = 2, z, (— 144 0,5 0, pi pp Epi’ pink, 


we find after some rather tedious transformations: 


E r - , a Ttre-2 1 SF (or 7,-1) \ 
(pi — pi)” (pj — pi)" = (rs uy A 
f=Ent. (+—2— F 
2 
S—h, (or r,-1) wai on , 
x = (— 1a pis pya- Vin) V(r) Ory, ry~hy Arg, ra—he (16) 
2= c . 
B 7y+12—hy—hg, f—hy—he 
1-1 r-1 
+ em Fa _ 2 (—1)ntr-h- ha] g ee 
Nnrtn-l nt ae Ty Ty" Ta, Ta—Mg 


hy yeh 
Beytry—hy—he faba pi “ pj” *) 
In the general case we have: 


\ 
E (p's, — Dis)” (Pig — Dig)” --- (Dig — Pin) 
MAT +. +t 2 pf orn-2) S-h, (or T.—1) Srhy—hy—.. yy (or r,.-) 
wy ie : 
Por pe NT =o hyn A,=0 


(RM BDI NT) Moy My dean FO 


i, (r,) (72) wey 
Ar,, rE-hy |. ee a eS oe -hy 
1 r,-1 4-1 i rah, Th, 


k 
+ Wrwea= > > od = (- Lyte tym him a pp 


t t 
h,=0A,=0 h,=0 ’ k 








x ay, Ty-hy** Opp, ry—hy Britt rym Is Pt... $7 hm hy, 


ee 
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If we agree to put A,,=1 and C,-1=0, then if 7,+7,+ ...+7%= 2r, the first 
term can be brought to the form (cf. Introduction (28)): 
1 r(orr,—1) r-h, (or 7,-1) r—h,—hg~...—hy—, (0 7, —1) 


ie Se b (—1y-2 pt pt |. pe 


eae hy=0 h,=0 " *s % 


1 cyte 2h 
x OF OF Op Ane Aine ++ Abnys Br—H,0 


| (18), 


and when 7, + 7, +... + 7 = 2r +1, to the form (cf. Introduction (29)): 
1 r+l(or r,-1) r+1—A, (or 7,—-1) r+1—h,—hy—...—he-; (Or 7-1) 

wea - (—1)rt1-F pt 

N™ 00 hg=0 hy=0 i; 

shy -h 2h, 2h 

xp. pat {H. OF OM An, AneBrvnne — | 

+ (19). 

2 2h 2h,—1 72h, 2h 

+O” 0.0, Adng ++» Aine Betr-eza + Of,” OF"... OF Any, 


rT. 


x Ai, , ee ae Byss-,0+ ee 
2h, py2he 2he—, py2he-1 
+O ON OR OM Aig Ainge Abaeiye Ate Bras-itc | 
Noting that, in accordance with (3), 





Se Ss 1 
p's, P's +++ Pig = ’ Pin Pig +++ Pig = Piy Pty +++ Pig & (—1)* HA r,n (20), 


we find, on the other hand, easily : 








E (p's, — Piz) (Pig — Pig) +++ (Pig — Dig) 
k-1 ( k-1 a ih 21 
= Pi, Pig-+- Pig > SRO Ban = Pi Pa ‘Py Cc? Baal ° 
k=Ent. (=) 
Hence: 
, , , 20 24 
E (p's, — Pig) (Pig — Dig) +++ (Dig — Pig) = Pig Pig +++ Dig \- y+ 4 


15 130 7 | (22), 


E (pig — Pig) ++» (D'ig — Dig) = Dig +++ Pig {- Wt yi ye 


/ , (210 924 720 
E (P's, — pi,) «++ (Pg — Dig) = Dig +» Dig ce We in| 





II 


k rT 
(1) Noting that m= EX y= E| ¥ v/s)", 
we find 


k k 
Me, (N) = E ~ Pj’ §? * ¥ pS Pa Pin Ej, é,| 
§? Ep;* + z x ‘ &, Ep’ anP’ ja 


& | ar + +pn0-n 2 2, bub [pom HPAP 


Ai=1 jek 


k 
e Pj Bh +5 2 Rok = 12, nat =m? + a lma— mit} 


| ~, 
van E Me 


i] 
~ 


fl 
os 
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In a similar way, the other formulae deduced above may be obtained (Chapter I, 
§ I). 
(2) Noting that 


k 
m,' — me = & (pi — pi) Ef, 
j= 
we find: 


E (m’,, — ™,,) (m’,, — M,,) 
R T2 o , ’ To 12 
=E} 3 (oj - pyres 3 Sn Pd (P'n — Pa) E&I 
j=1 A=1 Gri 
k 1 oe es (238). 
ee ae eta (RE A pees s date i igen 
= Pia E; E Pj (1 pi} + a FS gE; E. | W Pa Pa | 


V {Mersey — Mz, My} 





Similarly we find : 


E (m',, — m,,)(m’,, — m,,) (m’,, — My,) 


(24), 


y? {Mrrtratrs — [Mery ere Mery + Myy4ry Mery + Mopar, My, | + 2m,, My, mr,} 
E (m’,, — My,) (m’,, — mMz,) (m’,, — M,,) (M',, — Mr,) 
N-2 


W: {[Mp,42. — Mr, Mp] [Mrs irg — Mr, My] 


+ [Mp, 475 — Mp, Mry] [My 4 7, — My, My] 
+ [Mey iry — Me, Mr] [Mayers — Mp, Mry]} 





+ (25) 
§ 
+ N3 UN +retrstry [My srstne Mr, ie Meytretrs Mrs 
4 
+ Meytrstry Mr, + Mogi reiry My, | 
4 ) 
+ Mery tre Mrgtry F Mrztry Mrgtry + Mrsry Mrgtrg} 
and hence, or directly from the formulae of § I: 
7 , 2 1 e } 
E (m,’ —m,P = WV [ms, — m,?] | 
| 
1 | 
E(m, — m,) = y? [Msp — 3mo, Mm, + 2m,*] (26). 


3(N — 2) 
N? 


E(m,’ — m,)= 


1 3m, | 
[ma — m,?]? + v3 [gp — 4M5p M, + 3M, | | 


Biometrika xu 
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iit 


a (1) Denoting by ’ the difference X,y)—m, and by dy,’,; the difference 
Me — My, We find 


k k r 
ve = & pi lEs— Xu “3. Pi % (— IP OF (Ej— my wo 


= br — C0" pp + CP w” pw p-2— CP w ws +... + (27). 
= pp + dpe — Cy" py — CHo'dp’,1+ Co" py. 





+ Cfo’? dp' ».— C fw’ py; — CSw"*dp’,; + «.. | 


W. F. Sheppard, in his well-known investigation “On the Application of the 
Theory of Error to Cases of Normal Distribution and Normal Correlation*,” | 
terminates the development at the third term, taking | 


ey i Sie FUE Meas xvseccrocevecaneigcssavact (28). 
Hence : 
U, = Ev,’ = pr, 
Vy — Ve = Vy — ply = Ope’ — Ty @', 


E (v_, — vp) = E(u" ~ pp) = E (App ? — Ir pty. Boo'd,’ + 1°p*,. Bo" 
1 2 
— Wy Mar — be) — Tp Thema ras + apes Bs 
1 ‘ 2 
“TF {Mor — Me? — 20 pepgs fpr + Tp =1 Ma} 


We thus obtain, with full accuracy (cf. Chapter III (13)), the first term of the 


development of E (v,’ —v,) in powers of 7 This is explained by the fact that 


the terms rejected by Sheppard in the formula (27) do not yield terms of order 1 in 


N 
E(V, — V,)*. Owing to the same circumstance, we also get accurately the first 
term in the development of FE (v’,, — v,,)(v',, — vr,), Starting with (28): 
Ev, ‘ee Vr) (v's, ms Vey) = E (v'>, *: Hr,) (v's, — bry) \ 
=E [dy’,, —T) Pry ow’ | [dy’,, — 13 Myy—1 w’ | | (29). 


WV {Mestre Mey Brg 11 Bryan Pag — 12 Pry Mega F182 Pry Pry nal 


* Phil. Trans. A, Vol. 192. 
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But we cannot start with (28) in the calculation of the mathematical expec- 
tation of (v,’—v,)’, just as we cannot obtain the further terms in the development 
of E'(v,’—v,) in powers of 1/N. For these purposes Sheppard’s method must 
be put into a slightly changed form: more than three terms must be retained in 
(27). In the calculation of the terms of order 1/.V*, we must at the same time 
rely on the formulae of § III of the second Chapter and on the following relations 
easily deduced from them: 


arena | 
Eo'dp, = W brn 





NGS aasanipanaieen wate ic (30), 
Ew" = 7h 
, , , 1 
Eo dp %! du ts WN [Mrstra+ — Pry Bry Met Br,] 
er ay 1 
Ew" dp, = 9: [Mere — Mr Me] eee: (81), 
2 1 
Eo" = ibs 
, ‘ , ’ 1 
Eo’ dy',, du',, dp',, = WwW Corts Mears + Press Magers + Mrgt Mrytre 
— ays Ping Bory — Bory Porgts Pry — Mery Pry Porgti] 
1 
+ iG [orstretrs tt a (Metra Berg + Prytrgtt Pre + Pretrsti Mr,) 
N 
ba (Mry+r, Pers+i 5 Mery+rs Pret + Mry+1 brstre) 
+ 2 (Marts Mery Mery + Pry Prati Mery + Mery Pere Pst) 
Pl , , ‘ 1 
Eo "du %! du tf WN? (Mr, +r, BK, + Qua Pre+i —- bry, Mr, be] r (32). 
1 
+ WN; (Hrstret2 = (Mr, +2 bry + bry Mre+2) 
‘a (Mr.+r, ae) + 2s Frg41) 7 2h, Pr, He] 
, 3 1 
Eo"dyu, = We Mr+i he + a, [Mris— Me Ms — Spopys He] 
N' N 
, 3 1 
Eo = abe’ + a [oa — Bus" 





(2) To get the exact value of terms of order 1/N? in the development of 
E(v',, — br) (Ur, — Mr,) in powers of 1/N, it is necessary to start from 


, , , , , r(r—1) , 
Ve — pp = (Ape — pty, @') — (rw dp’, — - ‘ 2 Hy-2 0") 





te ~ -1)(r-2 , 
* (’ (7 . 1) w’? dp’ ps ie re“ ne Pr-3 © ‘) 
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After some simple transformations we find : 


, ’ 1 
Ev — r,) (v a Hy,) sa WV {Mrstrs — Peary Pre — 11 Pym Pret — 12 Peyt Pre-1 
HTT 2 fyi Pri He} 





1 T,+12)(%1+%—1 
+ yi [Cit 1) tent : crate ia Dig cule 
T(%2—1) r, (1-1) 





+ 7 Pryt2 brg—-2 + — 2 Mey—2 Pry+2 


rt] 
+ 12 (7; +12) Prt Mra +1 (7, + 12) Mey Prgt — -_ Pry Pres Ha 


= £7) pegs Merge My + (11 + 12+ 27172) fry berg 


m1(7,-1 
u( 3 ) (Br + 2) 44-2 Pre M2 


2 (72-1 
— CD) (Sr, + 2) pin hry a — 
~~ gr, T2 (7, +72) Bry-1 Mr—i 2 — $n T2 (rT: ‘© 1) Mry-1 Pra—2 Bs 
-$n(n-l1)r Pr—2 Pegi Hs + 37: r{-'] Mri Pry—s Ba? + trl r, Pays Mr Me 


+ in(n-lI n(n 1) Hyatt +... 


Noting that 


\ 


> (33). 





E (v's, ae Vr;) (v's, we Vr,) et E(v',, —- br,) (vr, a borg) aL (Vp, me br,) (Vy, pow Mr) 


' ’ 1 
=Ev "~~ Mr,) (V's, — Pr) — Ne [rir Mery Mery — $7172 (T2 — 1) Mery Mga Ma 


= 4r, (r; - 1) Te Pry—2 Pre Ma so fr, (1, - 1) Te (r2 = 1) Hr ,-2 Mro--2 Me] ekg 


we find hence: 


’ , 1 
E (v tae Vy,) (v —“= Ve_) = WV { frs+re 72 Pye Prema — 11 Pry—i Peet 


— Mey bry $1112 Mey beg po} 


1 { T1+72) (11+ —1 
+a} (r,+12) nan +e 2 - : Pie" 





+ $7, (72 —1) M42 by—2 + $r,(r, —1) Mey—2 Prete 
+ 12 (17; + 12) Pry Pry $11 (71 +12) Pry Pret — $r{—) Mey Pre—s Be 
— tr Peay—s Pret Pa t (7) + T2 + 1172) Me, bee 
— (1 41) 12 (72 — VD) Mey Mage a — 12 (71 — 1) (72 + 1) Bye Mer, He 
ie $r; T2(71 + 12) Mey-1 Prg-1 Me — 4 T11 2 (r — 1) Pry-1 Pra-2 Bs 


— $0 (11 1) 1s Mee Mega Ms + ENTE Mery Mry—3 Me + $I, Pry—3 Pry—1 M2? 


ba $7 ("; as 1) Te (2 a 1) Me-2 Mre—2 us ates 


\ (34). 
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Putting r, = r, =r in this, we find (cf. (13) of Chapter III): 


; 1 
Ey, = Ve) = WV {Mop — 2P fey Pea — fog +7? pw He} 


+ apa {20 bar + (Qe =) plage a + (0 — 1) tpg Hane Re 
+ 40? ppg bra — TH ppg Ms Me 
+7 (9 + 2) py? — 2x (7? — 1) py fy—o fe — Br? pPpa fo — 7° (7 — 1) pr ye Ms 
+ 7° (7 — 1) (17 — 2) papa Mys Ma? + 7° (7 — LP py 2 pe*} + -.- 
(8) To find accurately the first term (of order 1/N*) in the development of 
E (v"5, — Pary) (Pre — Bre) (Y'rg — Bers)» 








we must start with 
, , ’ ’ , ’ vr. = 1 to 
Vy — fp = (Up, =r pty) — (re dp pa — ( 9 p20"). 


Using the relations (30), (31) and (32), we find without difficulty : 
E(',, 5 Hr) (v's, 7. ry) (v's, i brs) 
1 
* WV? {insti a: [Hryire Mrs + Mryirs Pre + Br, Prsirs| + 2M, rz brs 
—([n Paya (Mrgergt — Pre Prs — Pre Pry) +12 Pr (Mert — Prt Bes 
= Por, Pergr) + 1s Mery (Meera — Pry Pre — Pry Pery+1)] 
+ [1172 ry ae (Mrgt2 — Mrs Ho) + 111s ber, Pry-1 (orate — Pr, Me) 
+ 273s Pry Prs—1 (Mry42 — Pry Me)) — 717273 Mr Pry Pers Bs 
-- [1s (Mert Meretts—1 + Meet Meytrs—i + Mrs Mrytre — Pye Pre Bsa 
— Pry Brett Berga — Per, Pry Pr) 
+1, (Mrs Pretest Prgti Meytre—i + Mery Prytrs — Pye Pre Pry 
— Pry Berga Ping ti — Pry Ming Pry) 
Hy (Mast ry trea H Pry tt Prytrs—i + Mr, Brates — Pry Pre Mest 
— Pry Peryts Mery — Pry Pre Mery) 
+ [117s Mya Margery a + 2plrgsr Pry — Pry Prg—1 He) 
+ 127 bery—a (Mrytry—1 Ho + Zor Pry — Pry Mry—i Me) } (56). 
FOYT Pyar (rg trg—a My + 2 pony Merger — rea Pry He) 
+ 1203 Mga (Mery try He + 2p ta Be, — Mery Pri Me) 
FT 2 Magy (Mr trg—1 Me + Zoey Merger — Mry—a Pry Me) 
+ TAN s Mopga (Mery prg—i Be + 2 pp, Mga — Mery—1 Pere M2) 
— [Bry 7.73 (Mey—a Pery—1 Pry Pa + Bery—i Pry Pery—1 Ma + Pry Mire Pry 2) | 
+ [$13 (1's — 1) prgo Marytre M2 + 2eryar Prati — Pr, Pre M2) 
+472 (72— 1) ppg a (Mr ery He + Zora Merger — bry Mrs He) 
+ ATCT — 1) Ber (Merging He + erat Parsi — Mere Mors H2)) 





VoL. 12 .@) 
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— 3 [brits (%3— 1) Bea Merges Prg—2 Ma + $1273 (1's — 1) berg Miya Perg—2 He 
+4rre("% - 1) Berni Pree Pry He + $72 (r,—1) Ts Pry Pre—-2 Prs—1 Mo 
+ 47, (n- 1) 2 My,-2 Mri Mrs Pot $n" (r; —1)r; Bee Pret Prs-1 He] I 

+ 3[4n7273(7s — 1) Pory— Prg—1 Pery—2 Me + $7172 (r2—1) 





X 13 Pri Mr.—2 Mrs—1 per +47; (7,—1) rers Mer,—2 Pre—1 Prs~1 | +... 


Noting that 
E (v's, => Vy) (u'r, aa Vry) (v'r5 age Vy5) =E (v'», = r,) (v's, = Mr) (v's5 ge Hrs) 
_ {(Vr, = br,) E(v’,, ‘ide Hr,) (v's = brs) ss (vy, = Hr.) 
x E (v's, = Mr,) (v's == Hr) 5s (Vy, “= Hrs) E (v',, os Mr,) (vr = Hr,)} 
+2 (vs, = br,) (Vr, = Mr,) (Vy, = brs) i 
= Ei (v', — Hey) (Ung — Bera) (rg — Brg) + - 7 Mr — 37 (7%: — 1) rye Me) | (37), 
x [Mrstrs — Pere Pers — 12 Pre Brg — 13 Preti Mrg—i + 121s Pry—i Mrs—1 He] 
+[r2 br, — $72 (72 — 1) Hy,-2 H2] [Hr+rs — Pry Pes — 11 Pry Pest 
73 Mayas Myg—a + 117s Myy—1 Prg—1 He] 
+[rs brs — $75 (7s — 1) p52 He] [Hry+re — Bry Pre — 11 Pry Pret 
=o Pers Meg—i + 1172 Mea My,—1 Mel} + -- 
we find hence the first term in the development of 





Ev’, = Vr,) (v's, <a Vyq) (U'rg a Vrs) 


in powers of 1/N. 


Putting 7, = 7, =1; = 7, we find: 
? 1 
E(u, — py) = Wy? {ose — 37 Morgs Hea — 3 (7 +1) bop Me + 9r(7 — 1) por ys pal 


— GF popaa Merga + 61? papa Mea Me + 87? ppg Mpa + 120 (7 +1) begs Me Mea 
+ Br (9 — 1) pps Me—o — 99? (7 — 1) ppg Mpa Mee Me 

+ (3r + 2) we — §r (7 — 1) py? Wys be 
— 97? (7 +1) oe Wp a — 9° Wa os + 39° (7 — 1) pp Mya Me?} +... ] 


» (38), 





A , , 3 ) 
E(v, —v,) = E(v,' - Mr) + WV: [ry — $r (7-1) pps He] | 
% [Map — 29 ppgr Hema — Me? + 7? wp fe] +... 
3 : 
= WN? { Mop —38r Mert Mr = 3 or Mem or Per-. Pr+i | (39) 


+ 67? poop i Mpa Pe + 39? Heys Mpa 

+ 6r (9 + 2) pyar be Mea + 87 (7 — 1) pps Hye 
— 6r? (7 — 1) Megs Mea M2 Ho + 2ps,° 
— Br? (2r + 8) pp Mpa oe — 1° wpa oa + 878 (r —1) wa ret} t+...) 
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(4) The first term (of order 1/N*) in the development of 
Ev’, =3 r,) (v's, 2 Hr.) (vrs a br) (v's, = Mr,) 
in powers of 1/N is obtained exactly from (24). We find: 
E (v's, ay, r,) (v's, a Mr.) (v's5 _— r,) (v's =~ Mr,) 


1 
af y: {[H e400 — Pry br, ] [Mrstre — brs Hr,] 7 [Hrs+45 — Pry Hrs] [Mrstre — Pre Pr] 


+ [Meytre— Mrs Pred [Mretts — Mra Mra] 
— [Maya Mery ts Marg trg + Pratt Prater + Mery Prgtrs — Prati brs Pre 
= Bory Borges Pry — Bry Pry Prrgt1) 
$12 Pryaa (Mayer Margery + Prgts Prytey + Prgts Mrytrs — Bryti Hers Meg 
— Pry Pret bry — Bry Pers Prats) 
+ 1s Psa (Meta Mestre + Brett rye + Meret Prytrs — Bryt Pry Pre 
— Be, Prot Pry Pry Pre Mrs) 
+ 1g Mega (Mayes Prgtrs + Meret Meyers + Bry Beyer, ~ Priti Pre Mes + (40). 
— Peay Pret Pry — Pry Pre Prt) 
HTT 2 Mraa Pra Merge He + 2brgss Prgsr — Pry bry Ha) 
ETAT s Pra Parga (Margery Mo + 2oryss Mega — Ming Pry M2) 
FMM Bry Pr (Mrstrs Ha + pres Mrsir — Pry br Me) 
+1273 esi Prs—1 (Mr,44 Be + 2s Brett — bry Mery Hs) 
+1975 Pr Py (Mears Hat Ta Pre — Pr; rs #2) 
A TST Mpg Pory—r (Mery ery Be + 2p ss Pret — bry Mery He) | 
— 8 [72% s bry a Parga Mtg Berges PF T1727 6 Mra Berg Pergas Pry Po 
FTN 31s Miryar Mergss Mrg—a Mirg—a Pea + 721316 Mya Pry Piry—i Pry Me] 
+ 3r,7o%s14 Bey Pre—i Prs—i Pry-i bs"]} +... 
Noting that 
E (v5, — Yn) (U'ng — Veg) (V'rg — Veg) (Y’rg — Yea) 
=F (v',, — fr,) (v's, — Hrs) (v'y — Hr,) (vr, — Pry) 
— {(¥ 4, — Pry) B Org ~ Br) rg — Hrs) (U'rg — bra) 
+ (Pry — Per) E (vn, — Bory) (Y'rg — Pra) U're — Bera) 
+ (vy — Hrs) E (v's, va Hr,) (v's — Hr.) (v' — Pe.) 
+ (vy, aa Hr,) E (v', = Mr,) (v's, 5h Hr,) (v5 = Hr,)} 
+ {(¥r, — Pr,) (Meg — Pry) E(,, — Pry) (7 rq — bry) 
+ (Yr, az Hr,) (¥;, — Hrs) E (v4, = Pry) (V's — Hr) 
+ (vr, ce Hr,) (Yr, > br,) E (v's, = br.) (V's = Hr) 
+ (Yr, — fry) (Ye, — Hrs) Ev',, =F Hr,) (V's, — Hy) 
+ (v4, — Hey) (Veg — Brg) E (v's, — Hs,) (0'r, — bers) 
+ (Vr, — bry) (Yrg— bry) E (v's, — Bry) (rg — Hr,)4 
—3 (v,, = Hr,) (¥,, —fir,) (¥7, zis Hr,) (¥,, - Br,)s 











208 EKzapectation of Moments of Frequency Distributions 


we see that the terms of order 1/N? in the developments of 

E (v',, eh Pr,) (v'r, ? Pr.) (v',, 7“ brs) (v',, ge br) 
and E(v',, 7 Vy,) (v's, i Vr,) (»'s, PF Vrs) (v's, ws Vy) 
coincide. 


Putting 7, = 17, =7r;=7,=7, we find: 
v,/ , 1 9 
KE (v,' — vy) = Ne {3 [por — pe?) — 120 pp [Por Mra — Mrs Hy] 


+ 69? pp [Mor Ha + 2p pp — fer” Po] — 120? perp Mpa Me + B94 psy pat} + + | (4). 
3 . 

me [Mor = Mr? — 29 berg Mea +7 wpa Ma) + «> 

The same formula (41) gives also the first term (of order 1/N*) in the develop- 
ment of E (v,’ —p,)*. 

(5) In the general case if we agree to denote 

(vr, ‘2. Ve) (Voy = Vr) vee (v4 — Vy,) by dy, 

and (v's, — Pers) (Y'rg = Pre) dos (v5, a br) by dy, 
we have: 





i (i 
Ed%y = BB%y— © (v4, - pn) ES 
h=1 Vr, — Brn 
i-1 i (42). 
+ EE (ve, ten) (ra, — Hen) B ee ps 
Ss a Ss es al i Ms (¥'54, — Prn,) (ry, — Pern.) 


We see that, in the developments of Hd®)y and H8é*y in powers of 1/N, 
the first terms (of order 1/N*) coincide, since 


Sy 





S (ve, — bn) Es 
h=1 


Y th — Perr 
. oe ; , 
contains no terms of order yi On the contrary, in the developments of Hd®*) p 
and H6*» y the first terms (of order 1/N**) are different, for 
i ae Hit) y 


>» (Ve, = ge) 
h=1 Vern — Erp 


, 1 
contains terms of order yin: 

Formulae (18) and (19) of Chapter II permit us to calculate H8%v and 
Ed" y in general, to an arbitrary degree of accuracy, in the same way as Edy, 
Edy, Ed“ vy were found above. The actual expression, however, is of so un- 
wieldy a character that I shall limit myself to the calculation of the first term in 
the development of #(v,’ — v,)™, coinciding with the first term in the develop- 
ment of E(v, —p,)™. 


In the calculation of the first term of the development of FE (v,’—,)* we may 
take 


Ve — Pep = Opty — Ty 0, 
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and limit ourselves to the calculation of the terms of order 1/N* in the development 
2% 
E (dp, —rp,o’f* = 2 (—1) Ci, 75 wi, EB (dy, Po ......... (43). 
j=0 


In formula (19) of the second Chapter put r,=7, =... =7;=1, and 

Ti = Tjpe= oe = TH=T; 

we find for 7=0: 
E(du,)*= a ‘2.6... Bla, Os 

forg=1: 


a Sf . 
E (pty 00! = HF, 1.3.5. (2E—V) ts [Mor — be Foe vee (44). 
When j = 2h: 
E (dp,’)*-* ow’ 
i= he ‘ oo: an oye 
=a (17471 .3.5....(8%-—-N-8-2)0 oe 
l 


‘2i-2h Py 
=0 


f=0 
‘ 2 ¢ « P Il-f 2 i-f 
x C2 [2h}21.3.5 ... (Ql-2f—1).1.3.5... (2h- 2-1) py" ew,” 


1.3.5... (2h—1).1.3.5...(2i—2h—1) *P rw 








eae Ss (—1)-/-1 
N‘ l=0 §=f=0 ( > (45). 
f 2¥ [2 nas hh pS Pia h-f | 
Gf)! Pw Prnhe rad 
1.3.5...(2h—1).1.3.5...(2i—2h—1) | 
= ace | 
h (or i-—h) 929 yi hi-a [a _ h jo “ : | 
s r+1 ~ = 2}i-h-g + 
xX is ps 3 ey 
4, (29)! Po — Py’ | 
When j= 2h + 1: 
EB (dp, @ ht 
| : E pte as : ov pultl 2i—2h-2-2 
= yi bal (— 1)-'11..3..5.... (21 - 2h — 2 — 3) Chg, B, 


l(orhk) 4 ‘ 
x = OF [2h + Lee 1.3.5... (21—2f—-1).1.3.5...(2h— 2f-1) 
S=" = s 
l-f 8+ kh-f 
X Poy Pri1 Pe 


| 
| 
| 
| 
| 


1.3.5...(2h+1).1.3.5...(2i—Qh—1) Hart Mem 





= os $ <1; (46). 
Nt 1=0)of=0 
QV [i —h—AUA yp pti vi-an-v-a k-s 
(I fy1(af+iyy Be Beh e 
1.3.5...(2h4+1).1.3.5...(2¢ -—2h—1) 
= yi 
bergen aepe Maka eee... | 


gg (29 +1)! 4 ) 
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Substituting in (43) we find, after suitable transformaticns : 





. o. Mes Oeste — 2h ; 
E (vy, —v, = = ae - ) [oor — br? + 9? wpa Me = 27, opi Mp)’ + ..< (47). 
F E (v,’ — v,)* aor : coe ; 
Noting that (EO, —»,¥F tends with increasing N to the limit 1.3.5 ...(2¢—1), 


[Ez (v,’ xs v, +P 
[E (v,’ ie v,)? Pp 
that the law of distribution of the values of V,’ tends with increasing N to the 
Gauss-Laplace law. 


and the ratio 





tends with increasing NW to the limit zero, we see 


E (vi — v2) , 
E (a cae tends with 





Comparing (47) with (14) of Chapter II, we see that 
increasing N to the limit 1. 

In the case in which the law of distribution of the values of the variable X 

E (vor > Voy)* 

. E (u'r - Hor )™ 

tends with increasing WN to the limit 1, for every positive integer r. But 

E (V'op41 - Versi) 

E (eons — Pops) 

the limit different from unity : 


follows the Gauss-Laplace law and p.;,, = 0, for r = 0, 1, 2, 3,... 00 


tends even for a Gaussian distribution of the values of X to 





' 1.3.5... (2r+1) 
~ (Qr +3) (Qr +5)... 4r4+ 1)’ 





Corrigenda to Part I, Biometrika, xu, pp. 140—169. 


. 142, Eqn (2) for ay; NI-*) read a, ;NI-4), 

. 142, Eqn (4) for A*0* read a‘0F. 

. 147, 1. 19 for gps1 read gy_). 

. 151, last line Eqn (11) for m,,(w) read mw). 
. 156, Eqn (27) for pp; read p,_;. 


N 
. 157, 1. 6 for * = (X;—m) read 
NV j=1 


oe ss Ss 


Ss 


; F 

V2 

. 157, footnote, for Proc. Imp. Acad. read Mém. Acad. and under Chebysheff refer to t. m1, 
p. 478, especially of Russian edition of his works. 


ae" aera ea 
. 160, 1. 8 for YUN) read v,, (WW) 


. 160, last Eqn of (11) for D”~? read Di") _,. 


1, (m) r,m—2 


am] 


(X;) — mj. 
1 


SS 


. 162, throughout this section y of author’s MS. has been printed Y. 
m (3m —1) 
72 


s Sy 


. 163, last line of Eqn (15) insert ml-*) after , and in 6th line of Eqn (15) for ml-2) 


a 
after oO read m{~%), 


p. 167, 8th line from bottom of page for Ei.) read Ey, dp. 


p. 167, 2nd line from bottom of page for K)!"A’"/ ~"fxt-2%-3) read K' “ "Fot—ok-j) 
‘ ee 
) 


p. 168, 2nd line from bottom of page for K{"~) read again iy 
—h 





AN EXPLANATION OF DEVIATIONS FROM POISSON’S 
LAW IN PRACTICE. 


By “STUDENT.” 


In her paper on the Poisson Law of small numbers, Biometrika, x, p. 36 et seq. 
Miss Whitaker after a very interesting analysis of the various attempts which 
have been made to test Poisson’s Law on actual statistics concludes that “A general 
interpretation based on a very simple conception seems needed for those demo- 
graphic cases in which the law of small numbers appears far more often to 
correspond to a negative than to a positive binomial.” 


The following is an attempt to explore the general question of what effect 
various departures from the conditions which lead to Poisson’s Law have on the 
resulting statistics, and especially which conditions lead to positive and which to 
negative binomials when the exponential might at first sight be expected. 


Poisson’s Law has been applied to the occurrence of different numbers of 
individuals in divisions of space or time: thus of yeast cells in squares of a 
haemacytometer, of deaths from the kick of a horse in Prussian Army Corps which 
may be taken as individuals occurring in divisions of space, or of suicides of 
children per year in Prussia which are individuals occurring in divisions of time. 
In such cases it has been asserted that if the chance of an individual being found 
in a given division is so small that when multiplied by the very large number of 
individuals the product is still a reasonably small number, then the frequency of 
divisions containing 0, 1, 2...7 individuals will be given by the terms of the 
exponential 


m m 
Ne {y +M+sZt+...t¢7—+...}, 
\ 2 [r 
where WN is the number of divisions and mm the mean number of individuals 
occurring in a division. 
For the above to be true it is necessary 
(1) That the chance of falling in a division is the same for each individual. 
(2) That the chance of an individual falling in it is the same for each 
division. 


(3) That the fact that an individual has fallen in a division does not affect 
the chance of other individuals falling therein. 








212 Explanation of Deviations from Poisson's Law 


As to these three conditions (1) is seldom or never true. I propose to show 
that this is generally unimportant; unless the chances of some individuals falling 
in a particular division are relatively high the Poisson law holds; the tendency 
however is towards a positive binomial. 

Next (2) is comparatively seldom true except in the case of artificial divisions. 
The result of this, as Pearson has shown, is that a negative binomial fits the 
results better than the exponential. 

Lastly (3) is often untrue. It will be shown that if the presence of an individual 
makes another less likely to fall into a division the positive binomial, but if more 
likely, the negative binomial will fit the figures best. 

We may start from the fact that if the chance of an event happening be q and 
of its not happening p, then the chances of its happening 0, 1, 2, etc. times in 
n trials are given by the terms of the expansion of (p + q)", viz. 


n(n—1 
eo: wa 3a: on ) yrs q* : etc. 


As the moment coefficients of this series about the zero end of the range are 
v, = nq, 
VY, = npg + n*q* whence p. = npq, 
the binomial is completely determined if we know », and pz for 


p=" q=1-p=1-© and n= =—" 
1 


VY; q Uu—t 
and in particular the binomial is positive (i.e. n and q are positive) if “<1 and 
VY 


negative if - >1. In the particular case when = 1 the binomial becomes the 
1 1 
Poisson exponential. 

It is therefore unnecessary to deal with higher moments than the second for 
the purpose in hand. 

Let us first consider the result of each individual having a different chance of 
falling in a given division :— 

Let the chances of n individuals falling in a given division be q, q2, Gs ++» n+ 
The chances of their not doing so are therefore (1 — q,), (1 —q.),(1- 93) .-- (1 — qu), 
and the chances that 0, 1, 2... of them will fall in that division are given by the 
various terms of the expansion of 


{(1 — qr) + qi} {C1 — G2) + Go} {1 — Qs) + af (.--- ee ) {1 = qn) + qn}, 


i.e. by 
(1 — qu) (1 ats qe) eee (1 — qn) +8 {m1 as 2) (.++) (1 + Qn)} 
+8 {4192 (1 — qs) «+» (1 — qn)} + «0. +S {Q1G0qs «++ Jr 1 — Gru). — qn)} Heo 
+ 1924s ++ Yn, 
the term 8S {1929s — dr (1 — Gp41)--- (1 — Gn)} giving the chance that exactly r 
individuals will fall in the division. 





wr 5 | 
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The sum of the above series is clearly unity so that the lst and 2nd moment 
coefficients about the zero end of the series are given by two series of which the 
rth terms are 


rs {92 — Ur (l— Gri) -(1- Yn)} and 8 {19s +00 Ue (1 — Yrs) «+» (1 - Qn)} 
respectively. 


These series may be summed by rearranging them in the ascending order of 
the q products thus: 


S {qi(1— q2)(1—gs).--(1—Gn)} = S (qi) —28 (Giga) + --. +(— 1) 7.8 (Gi ge--- Gr) + « 
28 {q,.q2(1—9gs)(1—qa)---(1—gn) } = 28 (qigo)+---+(—1)?*r(r—1) 8 (qigqe..- dz) +--- 


COR e meee HEHEHE THEE EH HEE EHE HE ESE EEE SEES HEE EEE HS EEEHEESEEEE ESSE EEESE EE EEE EEE EESEEEES 


tS {qi qo +--+ Gt (1 — Ger) --- (1 — Gn)} =tS (Gage --» Ue) +s 


+(—1)-* r.|r—1 S( 
jé—1|r—t (Qe +++ Up) +... 


CORP eee eee OPO O HEHEHE HEHEHE EEEHT EEO EEE EEE ESET EE EES HESS EEE EEE EEE SEES S CEEEOE SESE EES eeeSe 


OE ia, «ct C2: Gena) «.- 48 Ga @ ace es oem men Y.S(Qge--- Gr) +... 


Adding these we get on the left v, and on the right S(q,)+ a number of terms 
of the form r(1 — 1)" S(qq2--- Yr) which accordingly vanish and we get 


n= S(q). 
In a similar manner it can be shown that 
v2 = S(q,) + 28 (qq); 


and other moment coefficients about zero can be found in the same way, but we 
are not here concerned with them*. 


If 7, g@? are the mean values of g and q?, obviously 


Ry Ei) WN sis casi tsa ee AN de cee (1), 

and v,=S8(q,) + 2S (qrg2) = S (qi) + {[S (q)}? - S(q*) 
= NY + 2G" — NG... ...ccccccccccccesseeece (2), 
= NJ + n°G* — NG* — NO G........cceeeeeees (3), 


. fy = NJ — ng’ — no? 
gc, 
= ng (1 —G — ZZ) .........cceccccscsees 
ng ( q 4 Seu seee (4). 


* The moment coefficients are : 
Mg=NDq — Ngke, 
Mg=npg (DP — J) — 8n (DP — G) gua + 2n qus, 
bg = npg {1+3 (n - 2) pq} —n {7+ 6 (m - 6) DQ} quet 12m (p — 9) gus — Gn qus + Bn? guy? 


where gue etc. are the moment coefficients of the qg distribution and p=1 — g. 
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If now the distribution of chances is to be represented by the binomial 
(P+Q)*. Then 





Q=1-@=1 _ (1-9 o¢//9) 


nq 
2 
oq 


ME mete hal cn wranw ean ciadnessaeekeontcaeete hea 5). 
+5 (5). 


Since the original q’s are the chances of events happening they are always 
positive so that the above expression must be positive and the binomial positive. 

If now we introduce the Poisson condition that 9 though positive is negligibly 
small (5) becomes in general zero, for o, is usually of the same order as g, and in 
that case Poisson’s law holds in spite of the inequality of the original q’s. If 


2 
however 7; is appreciably greater than zero (as in the extreme case 


2 n-1 
h=$ == ---=Gnr=0 when ; he =1), 


the distribution of chances is to be represented by a positive binomial. 


Next we have to consider the effect of disregarding condition (2), namely that 
the chance of an individual falling into it must be the same for each division. 


Let us suppose then that the q’s are all different for each division so that ng is 
also different. 


Then writing m for ng and m, m*, nq? for the means of m, m? and ng? taken 
over all the divisions. 


We get from (1) IEG cv su i devesacscunenevatheuscodens soar ernie” (6), 
from (2) vy = ™ +m — ng? 
TS Ht — GF oc cssccesss.cncvcvevsesencs (7), 
eRe a Fee 2h A (8). 
As before if (P + Q)* is the best fitting binomial, 
ng? — o»,? 
Bai “4 “a “fon d 


Hence if o,,? > ng’, which if there is any appreciable variation in m is probable, 
since as explained above ng? is generally negligible, a negative binomial will be 
found to fit better than the exponential. 

Clearly condition (2) is usually not fulfilled in the vital and demographic 
statistics; divisions either of space or time are generally governed by different 

* If we suppose that q does not vary with the individual but that nq (=m) varies with the division, 


the moment-coefficients of the m distribution being written ,,u, then the moment-coefficients of the 
resulting distribution of divisions are as follows : 


42= m+ m2 


Bg =M+3 yn bet més, 


ug =m + 3m? + (7+ 6m) yH2+ 6 must mM4- 





STUDENT 215 


environments which will vary the chances of an individual falling into them, and 
so we may expect that as a rule negative binomials will occur in place of the 
exponential. 

Finally, suppose that the presence of an individual in a division influences 
the chance of other individuals falling in that division. 

Clearly it may do so either by way of increasing the chance or diminishing it. 

If the chance be increased it is clear that we shall get for the same mean 
number of individuals per division a larger number of divisions containing high 
numbers of individuals and a larger number of zero divisions. In other words, for 
the same mean we shall get a larger Standard Deviation, so that p,/», will be 
greater than 1 and a negative binomial will fit better than the exponential. On 
the other hand, if the chance of other individuals is decreased by the presence 
of one already in a division. y,/y, will become less than unity and the best fitting 
binomial will be positive. The first of these two cases includes linking or clumping 
of events or bacteria, the second such a thing as the counting of large cells on a 
haemacytometer whose divisions are comparable in size with them. 

We have now shown that a population which might be expected at first sight 
to follow Poisson’s law 

(1) Will do so if the only deviation from the ideal conditions is that the 
chances of different individuals falling into the same division are not equal, as 
long as these chances are all small. 


(2) If in addition to this the chances of some individuals are large a positive 
binomial will fit the results better than the exponential. 

(3) If the different divisions have different chances of containing individuals, 
as is usual, a negative binomial will fit the results better than the exponential, 
except in so far as (2) may interfere. 

(4) If the presence of one individual in a division increases the chance of 
other individuals falling into that division, a negative binomial will fit best, but if 
it decreases the chance a positive binomial. 

Generally speaking (3) is the operating deviation from Poisson’s conditions and 
accordingly most statistics give negative binomials. 

Finally I should like to point out that the object of my original paper ( Biometrika, 
Vol. .v) was to give the user of the haemacytometer a guide to the error which 
he may expect from its use, and that the net result was that the probable error of 


his count was ‘6745 VN where N was the total number counted* and that if NV be 
a reasonably large number tables of the probability integral may be used, otherwise 
the exponential (or better still go on counting). ‘This result is not affected by 
slight deviations from the Poisson law, any more than slight deviations from the 
normal law affect our use of the probability integral tables. 

* Biometrika, Vol. v, p. 355. ‘I probable error of 1 is *6745,/m/M where m is the mean and 


M the number of unit areas counted 
N=+:6745\/N as above. 


this we put M=1, then m=N and the total count is 
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(1) Introduction. 

The object of the present paper is to inquire what is the proper method of 
examining psychophysical curves as to their goodness of fit. In psychophysics 
various mathematical processes are employed for fitting theoretical curves of 
“ogive” form* (known to psychologists as psychometric functions, but really error 
functions), to data of a certain kind, usually threshold measurements collected by 
the “ Method of Right and Wrong Cases.” The best known of these mathematical 
processes is the Miiller “Constant Process,” using the probability integral+. To 
make the material in which we are about to work understandable, it is necessary 
first to go into some detail as to the nature of the experiments which supply 
the data to be fitted, and as to the theories which have led to such mathematical 
curves being drawn through these data. 


Most of the experiments in question have for their object the determination of 
the conditions of our experiences of equality and difference. For example, suppose 
we compare two weights, one of which is 100 grams, by lifting them in succession 
by the right hand with a number of experimental precautions, into which we need 


* The term in this connexion is Galton’s. 

+ G. T. Fechner, Elemente der Psychophysik, 1860; G. E. Miiller, ‘‘ Ueber die Maassbestimmungen 
des Ortsinnes der Haut mittels der Methode der richtigen und falschen Fille,” Pfliigers Archiv fiir die 
ges. Physiologie, 1879, x1x, pp. 191—235, especially par. 5 et seq.; G. E. Miiller, Die Gesichtspunkte und 
die Thatsachen der Psychophysischen Methodik, Wiesbaden, 1904, par. 11; F. M. Urban, ** Die Psycho- 
physischen Massmethoden,” Archiv fiir die ges. Psychologie, 1909, xv, p. 287; G. H. Thomson, ‘‘ The 
Accuracy of the ¢(y) Process,” Brit. Journal of Psychol., 1914, vu, p. 46, and in various text-books, 
e.g. Titchener’s Experimental Psychology, and W. Brown’s Essentials of Mental Measurement. 
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not here enter. We wish to know under what conditions the unknown weight 
appears lighter than, equal to, or heavier than the standard weight. An important 
condition is of course the “actual” weight of the unknown weight, as measured in 
the usual manner. But this is by no means the only important condition. The 
order in which the weights are lifted (whether standard first or unknown first) ; the 
number of categories inte which our judgments have to be classified ; the order of 
succession of the several unknown weights, whether rising or falling or at random ; 
the range over which the succession of unknown weights stretches, whether or no 
it contains any which are quite easily distinguished from the standard; all these 
and many other conditions are of great importance. Steps can however be taken 
to eliminate some of these factors, by means of judicious experimental precautions, 
and the attempt can be made to keep the others as constant as possible during 
a series of trials. The judgments which are given by the subjects then depend 
mainly on the difference between the standard stimulus and the variable stimulus ; 
in the case of our example on the difference between the standard weight and the 
variable weight. Among other points of importance in the fitting of the curves is 
the possibility of deciding hy means of the goodness of fit whether the experimental 
conditions have really been kept as constant as has been hoped, for lack of constancy 
in this respect will lead to heterogeneity which will show itself by the necessity of 
using a compound curve to obtain a good fit. 

To fix ideas, it is desirable at this point to have an actual set of data to refer to. 
In some very carefully conducted experiments on weight-lifting, Professor F.M. Urban 
(op. cit.) found that, with one of his subjects, under certain experimental conditions, 
the standard weight being 100 grams, the following numbers of answers heavier 
were returned, out of 300 trials with each of the several unknown weights. It 
should be mentioned that the experimental method used involved that the unknown 
weights were presented to the subject in random sequence, accompanied each by 
the standard, so that the 300 trials referred to were not one after the other, but 
were separated from each other by trials with the other unknowns. Otherwise 
expectation and other psychological factors producea considerable correlation between 
one judgment and the next, which is reduced to a minimum by Urban’s procedure. 
Moreover, precautions against fatigue and several other factors were taken. For the 
details the reader is referred to Urban’s memoir, with the warning that much of 
the mathematical part thereof is incorrect. 





Grams s Answers heavier | Proportion p | 
84 7 out of 300 0233 
88 8 out of 300 0267 
92 35 out of 300 "1167 
96 107 out of 300 *3567 
100 183 out of 300 “6100 
104 265 out of 300 *8833 
108 279 out of 300 *9300 





It is to numbers such as these that the curves to be considered are fitted. 
Biometrika x11 
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Any suitable curve which happened to occur to one might of course be employed. 
For example, a parabola of higher order can be used, and the curve tan~ @ has also 
been used. But clearly the whole experiment suggests that an error function of 
some sort is wanted, and as early as 1860 G. T. Fechner (op. cit.) suggested that 
such numbers formed the integral of a normal or Gaussian curve. One usual 
argument is somewhat as follows, using for clearness terms applying directly to the 
above example. 

The existence of a hypothetical point is postulated, called the limen or threshold 
for the judgment heavier, such that above this point the subject always returns the 
answer heavier, and below it he always returns some other answer, not heavier. 
But this limen is supposed to be fluctuating from moment to moment, either really 
or apparently, owing to changes in the physical, physiological and psychological 
conditions of the experiment. If at one moment the answer heavier is returned, 
for the variable 96 grams, then at that moment the limen is below 96 grams. 
Later the answer lighter, or the answer equal may be returned for 96 grams, and at 
that moment the limen is above 96 grams. The values p in the above table will 
then be integrals of the distribution curve of this limen. 


(2) Peculiarities of Psychophysical Data from the Point of View of 
Curve Fitting. 

The problem of fitting a distribution curve integral to such data, say in the 
first place the probability integral, has certain peculiarities which differentiate 
it from many biometric curve-fitting problems. 

Usually, when we are required to fit a normal curve, we are given the data in 
histogram form : 














M4 





Ms Ms 


‘= Tee Gti ~~ Pe 


That is, a number M of direct measurements is made, and m, are found to fall into 
a certain short range, m, into another adjacent range, and so on. To fit a Gauss 
curve requires the mean and the standard deviation, and these quantities can 


be directly found from such a histogram, Sheppard’s adjustments being used if 
necessary. 























Quantities analogous to our proportions p can be formed from such a biometric 
histogram, viz. : 
p, = m,/M, 
P2 = (m, + m,)/M, 
Ps = (m, + m, + m;)/M, 
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and vice versa, quantities analogous to the m’s can be formed from the proportions 
p of the psychometric experiment, viz. : 

m, = pM, 

Ms = ( p.— p,) M, 

m; = (ps; — p») M, 


In the case of our example we should have: 


Below 84 grams, 7 cases, m/M = :0233 
ss  aeeri » "0084 
s8— 92, ae » 0900 
ae ” . » 72400 
96—100 ,, “ -. » 2533 
100-104, ma » 2738 
104108, a » 0467 

Above 108 s a » 0700 

Totals Be wep) 24s 1°000C 


There are however important differences which make the analogy inexact from the 
curve-fitting point of view. 


In the biometric histogram, if any one of the cells m, is larger than it ought to 
be, then any other must have a tendency to be smaller than it ought to be. There 
is a strong negative correlation between the numbers in the cells, a correlation, 
that is, from trial to trial. In the psychometric pseudohistogram however, formed 
from the proportions p, this is otherwise, because the p’s are measured quite 
separately from one another. 


In the biometric histogram the m’s, the numbers in each cell, are necessarily 
positive quantities. In the psychometric pseudohistogram they may be negative, 
if the p’s do not rise steadily. In the biometric histogram the actual range found 
in a trial is as a rule known, that is the points where p 1s zero and p is unity are 
known. In the psychometric case these points are as a rule not known, and there 
may be psychological reasons why extreme stimuli (such as would be required to 
find these points) should not be used. In our example we do not know whether 
the subject would have given no answers heavier at 80 grams, or whether at the 
other end he would have given only answers heavier at 112 grams. 


When we do know these points, or can assume them, in the psychometric case, 
we can fit our probability integral by forming the pseudohistogram, and calculating 
the mean and the standard deviation as though it were a real histogram* This has 
been suggested by more than one writer, in England by Professor C. Spearman, 
who does not however point out the difficulty that it cannot as a rule be done, 
because the points p=0 and p=1 are not known. 


* The actual arithmetical formation of the histogram is unnecessary if a summation method of 
finding moments is employed. 
+ Brit. Journ. Psychol., 1908, 11, p. 227. 
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In biometric language. the problem is to fit a normal curve to data for which the 


“tails” are undefined as to range, although their areas are known. This problem 
was solved by Miiller (op. cit.) as follows : 


(3) The Constant Process. 


Call the stimuli Gi. 8i1 Oe, ces 
and the proportions p Pr» Ps» Pas -+> Pas 
then we have n equations 
1 sh(s-s) 
p-3-7-[ e“dz=0 Peer cere sseseserteseseseses (1), 
17/0 


to find the mean S and the precision h. We retain for the present this form of the 
integral as being more familiar to psychologists. The more modern form would 
have the standard deviation instead of the precision as the second urknown. 


These equations are slightly inconsistent with one another. No pair of values 
S and h will exactly satisfy all n equations; instead of giving zero they leave small 
residuals 2. 

Miiller assumed tacitly that these m equations if based on the same number of 
experiments each, are of equal importance or weight*. We shall allow this 
assumption to pass for the present but shall return to it later. 


If we now make the usual assumptions of the Method of Least Squares, we can 
take as the best values of S and h those which make 


> (v,) a minimum, 


where the summation is over the n stimuli or n equations. The conditions that 
this should be so are 


5 > (v2) =0 for constant S 
@ (v2) =0 for constant | 
ds = (v,;7) = 0 for constant h 


Unfortunately, the n equations however are very far from being simple and linear 
as in usual applications of Least Squares. To avoid this difficulty we look up in 
tables of the Probability Integral (which psychologists call Fechner’s Fundamental 
Table) those n values of 

0 BD ca iccntoveesaceceericncnusseehecons (3) 
which correspond exactly to our n values of p. These equations are not yet linear 
in S and h, but if we write 


OME TR dp vste dared beetbepeenevacemericen (4) 
they become OG OW © ccersessiisducss beatae (5), 


* There is unfortunately a possibility of ambiguity of language here as the word weight also occurs 
in the particular example we are using as illustration, where weights of 84 grams etc. are employed. 





ie aaa 
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and are now linear inh and c. We have now n linear equations in which y and s 
are known, h and ¢ are required. If we insert any pair of values h and c into these 
n equations they will leave residuals v,. If we were now to proceed to make 


> (v2) a minimum, 
this would not effect our purpose. It is = (v,?) we wish to make a minimum, not 
= (v). If however we can find multipliers or weights M such that each 


we can then make > (Mv?) a minimum. 


That is, we can apply Least Squares to the equations (3), weighted with certain 
artificial weights M. The use of this device is Miiller’s particular credit in this 
connexion. 


Clearly the residuals v,, which may be regarded as errors in p, are connected 
with the residuals v,, which may be regarded as errors in y, by the equation 
_- 
B84 eC YU = 
from equations (1) and (3). Therefore 
M =e-™"/n. 


Herein we can omit the 7 since it is only the relative values of the Miiller weights 
which are of importance. These weights are given in most works on psychophysics, 
e.g. W. Brown, or Titchener, op. cit. 


The condition that = (v,2) should be a minimum has now become, that = (M?,") 
should be a minimum. With this substitution, the Normal Equations (2) give 


[Ms*]h —[Ms]c= wen} 
—[Ms]h +[M]c =—[My] 


the square brackets being the sign of summation used by Gauss, and still persisting 
in psychophysics in this connexion. The summation here is over the n equations. 
Thence we have 
[Ms] [Msy] — [My] [Ms*] 
~~ [TM] [Ms] —[MsF 
jp = LM] [Msy] — [Ms] My] 8) 
(| (Ms*] — [Msp ' Rcheiccct wenswoscu raters (8). 


gn 2 _ [Me] [Moy] - [My] Le" 
h~ [M][Msy] — [My] [Ms] 











(4) The Probability of a Certain Category of Judgment. 

The Constant Process remained in this form from 1879 to 1909. It is very 
much mixed up with the psychological method of experimenting and collecting the 
data, so that frequently the name “Method of Right and Wrong Cases,” really the 
name of a certain method of collecting data, has been used to include this mathe- 


VOL,12 — P 
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matical process. To avoid this mental confusion I have elsewhere* suggested that 
the two words Method and Process should in psychophysics be consistently used in 
the way in which they are employed in the above sentence, viz. Method of collecting 
data, and Process of calculating. Frequently the Constant Process has been called 
the phi-gamma method, from the use of the name phi-gamma for the probability 
function. 


In 1909, F. M. Urban (op. cit.) suggested alterations to the Miiller weights,. or 
rather suggested the necessity of another set of weights in additiont. These 
alterations arise from the notion of comparing the judgments heavier with the 
drawing of black balls from a bag containing black balls and white balls. The 
analogy is in detail as follows. 


(1) From a bag containing black balls and white balls 300 drawings are made, 
one at a time, the ball being returned each time before the next drawing is made. 
107 black balls are observed out of the 300. 


(2) A subject on performing a certain experiment with weights sometimes 
gives the answer heavier, sometimes some other answer. On one occasion, when 
the weights were 100. grams standard and 96 grams unknown, this experiment was 
repeated 300 times, with due precautions against fatigue, etc., and the answer 
heavier was returned 107 times out of the 300. 


Now if p is the observed proportion (here 107/300) of bluck balls in a bag, then 
the probable error of p is known to vary with Vp(1—p)t. With the same sized 
sample, a result p=°5 has a larger probable error than a result p=‘8, say. If 
anything similar holds, as the analogy suggests, for the psychometric experiment, 
then the n equations (1) or (5) are not equally reliable, even although based on the 
same number, 300, of experiments cach. In addition to the weights M they need 
other weights 

1 1 
nan ae One (9) 


to allow for this new variation in reliability. The combined weights M/4pq are 
known as Urban’s weights, and a table of these is usually given in psychophysical 
textbooks alongside the ordinary Miiller weights. Urban discusses the matter at 
some length in his already cited article, and a discussion will also be found in 
Wirth’s Psychophysik (Leipzig, 1912) where on page 151 the actual scatter of 
various p’s is given in a diagram. 


* Brit. Journ. Psychol. 1912, v, p. 203. 

+ There are many errors in the article of Urban’s quoted. See my articles in the Brit. Journ. Psychol., 
1913, v1, p. 217, and 1914, vi, p. 44. But these errors, though making Urban’s conclusions in that 
article invalid, do not touch the point here raised, in which I think Urban’s suggestion marks an advance. 

t Really the true values of p and 1—p should be used but this is the best we can do. And further, 
the expression, probable error, ceases to have an accurate meaning when p is too close to zero or unity 
and the distribution in consequence is very skew. But these refinements do not matter at this stage of 
our argument. 
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Replacing in equation (7) therefore the weights M by the new Urban weights P, 
Urban found in the present instance 
S = 98°24 grams, 
h = 0117995. 
That is he represents the proportions p theoretically by using the hypothesis that 
the “ psychometric function,” as psychologists call it, is given by 
, 1 07117905 (98-24 - 3) 
elma 
The theoretical values p’ thus calculated are compared with the actual values p in 
this table. 














| 

Grams Pp p' Difference x | 

84 0233 0088 +0145 | 
88 *0267 0433 — ‘0171 
92 ‘1167 "1489 — °0322 
96 *3567 “3544 + °0023 
100 ‘6100 *6155 — 0055 
104 ‘8833 “8319 + 0514 
108 | “9300 "9483 — 0183 














The object of the present paper is to make clear the proper methods (a) of as- 
certaining, in all such cases, whether the theoretical numbers are a reasonable fit 
to the observed numbers, or not, and (b) of comparing the fits obtained by different 
hypotheses, that is by different error functions. The psychologist would express 
this by saying that he was comparing different psychometric functions. To the 
statistician the comparison is one of error functions, the natural procedure being to 
try first the normal curve, then members of Pearson’s family of curves, then 
compound curves; the conclusion in the latter case being that the material was 
not homogeneous. This work I have as a matter of fact already carried out, and 
have come to that conclusion; but it is beyond the scope of the present paper, 
which hopes to interest psychologists in modern statistical methods, and statisticians 
in modern psychology. 


(5) Pearson’s Criterion of Goodness of Fit. 


This problem, of comparing the goodness of fit of curves in psychophysics, 
although it has not as far as I am aware ever been correctly performed, is really 
very simple and could be handled at once from first principles. For the sake how- 
ever of showing the connexion with other work it is advisable to treat it as a special 
case of the application of Pearson’s Criterion of Goodness of Fit*, which is in brief 
as fullows. 


* Karl Pearson, Phil. Mag., July 1900 and April 1916. 
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Let ©, Be, Be, 02. he, 


be a system of deviations from the means of n variables whose standard de- 
viations are 
D1, Fg, Ty, --. On, 


and intercorrelations Toss Tas Tene +--+ Tackee 


Then the frequency “surface” giving the frequency of occurrence of each possible 
combination of 2’s is 


oe a SF tl ae ROE (11), 
Rue x2 Ry x2 
2= =~ y Bees Seen ui vigsaceeoocececsceese 12). 
where 7? = 5, ( ‘ =) + 28, ( ; wut) (12) 
Herein R is the determinant 
1 Te Tis Tin 
| 7 1 1. Ten | 
£5 ph eeltog: a eee ndeageetarestoctvhkveds (13), 
| | 
| seseseeeeseeeeeeeeeeens 
| Tm Tne ns 1 


and Ry, Ry, are the minors corresponding to ry, and ry. SS, is a sum over all k’s, 
and S, is a sum over all pairs kl other than k=l. 


When x* has been calculated, a probability P can be found, from Table XII in 
Pearson’s Tables for Statisticians. This table is entered by n’=(n+1) and y?, 
and gives values of 





that is, P is the probability that a random sample of as bad a fit as the data, or 
werse, would be obtained from the theory which is being tested. The kind of data 
for which this criterion was first invented was data in real histogram form, of the 
kind called in earlier sections of this paper a biometric histogram. When the data 


are of this form, Pearson has shown that equation (12) reduces to the very simple 
form 


where m’ is the theoretical value of m, and e is m — m’, and S indicates summation 
over all the cells of the histogram. Psychophysical data of the kind here con- 
sidered, however, as has already been pointed out, are not really in histogram form. 
Although a histogram can be deduced from them, it is only by making certain 
assumptions, and the intercorrelations of the cells of this artificial histogram are 
different from the intercorrelations of a natural directly observed histogram. 
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It is not correct therefore to use equation (15) above. It is more accurate and 
withal exceedingly simple to apply equation (12) direct to the p’s. Since the 
latter are independent, all the intercorrelations r are zero. Therefore FR is unity, 
Ry is unity, and Ry is zero. Equation (12) therefore becomes 


y= 8(5) eee Sees ae; 


and as the distributions of each p will be binomial in form provided the experimental 
conditions remain constant enough we have 


Pr MER 6hssisaccimncecsecktmaiealae (17)*, 

where » = the number of experiments on which p is based, and p’ = 1 — q’, so that 
2= 8 (=) AT Sets Ba ores 2 18). 
x= 8 a (18)t 


Herein the «’s are the differences between observed p’s and theoretical p’s. The 
probability P is then found as before. 


* If we look upon the judgments heavier, as suggested in an earlier paragraph, as being comparable 
with drawing black balls cut of a bag containing black balls and white balls in the proportion p’ and 


1—p’, then the probable error of p is -67449 ms P CaP) » » being the number of judgments of which 


pew are of the category heavier. 


For the chances of obtaining 0, 1, 2,... 4-1, or « black balls in a drawing of uw are given by the 
terms of 


(p’ sf q’)* , 
q’ being 1-p’: that is, the chances of obtaining 
eS 243 -1 
P=-5 ~» Si ian ee 
» Bp bb Be Me 


The s. d. of the above binomial is V/up’q’ and the s. d. of p therefore igi = Jt! : 


+ Compare Professor K. Pearson on ‘‘ Goodness of Fit in Statistics and Physics,” Biometrika, 1916, 
XI, pp. 239—261, especially p. 257. 

We can check our equation (18) by treating the matter from first principles, and not as a special case 
included in Pearson’s formulae. We have, from this point of view, n quantities p which are independently 
measured, and n quantities p’ which are theoretically given. The variations from p’ are binomial in 
form, that is, they are approximately Gaussian. The probability of an error 


Ly = Pe- Py 
“= pat 
is therefore w= a a,c, een ene I REE Un yd SP (a) 
Jpg 2m 
The probability of the whole set of observed values p;, po, ps, ... Pa occurring is the product 
PRE GW 50a. va wnhokecovsivessssecscantesnbescreveqetueeet ees (b). 
Write this z=2e 2%’, 
2 
Then x7=S (45) 
Pq 


from equation (a). 
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(6) Numerical Example. 


Let us apply these formulae to the example already cited. The calculations 


are carried out in the following table. The theoretical p’q’’s should be used, clearly, 
as denominators of the terms of x’. 

















p’ | q’ | pd’ | x | z*/p’q’ 
as burt | 
“0088 | ‘9912 00872 | -00021025 | 0241 
0438 “9562 04198 | -00029241 | 0070 
“1489 ‘8511 ‘12673 | 00103684 | -0082 
3544 | 6456 22880 | +00000529 | -0000 
6155 | °3845 23665. | °00003025 | -0001 
‘8319 | 1681 "13983 | 00264196 | 0189 
9483 | -0517 04903 | -00033489 0069 
| | 
ee | = | — | = | 652=9 (24/99) 
ca | Te 








The number of experiments was the same for each p, viz. 300, therefore 


= 8 (5) = 300 x ‘0652 = 19°56*. 
Pq 





The Table XII in Pearson’s Tables to find P has to be entered with y* and 
n’ =(n+ 1), where n is the number of variates, here the number of p’s, i.e. 7. We 
find there 

n'=8, x*=19, 008187=P, y*=20, 005570=P. 
It is unnecessary, with data such as we are here handling, to interpolate elaborately. 
Clearly, for x? = 19°56, P is of the order 


P =:007. 


That is to say, in only seven cases in a thousand should we expect to get our 
present observed p’s from our theoretical p's by random sampling. It is therefore 
not at all probable that the equation (1) truly represents the “psychometric 
function” for this subject and this reaction. 


(7) Urban’s incorrect method of comparing Goodness of Fit. 


In the article from which the above example is taken, Professor Urban was 
inter alia desirous of comparing various hypotheses of the “ psychometric function ”’ 
among themselves. Those which he fully works out are (1) the above assumption 
that it is the integral of the normal probability curve, and (2) the assumption that 
it is an “arctan.” curve (tan 6). (It is needless to point out surely that the latter 
hypothesis is in itself most unlikely; however, we are here concerned with an 
empirical comparison of the two hypotheses, and it is important that the method 
should be correct since it will be necessary to compare other and more likely 
theories, as for example Pearson’s curves.) 


* Compare Appendix. 











| 
| 
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It can now be shown that the methods which Professor Urban employed in 
comparing these two hypotheses are incorrect and inadequate. What these in- 
adequate methods are can best be shown by continuing the above example, which 
is taken at random from among Urban’s material. 


We have already found the squares 2* of the differences between theory and 
observation in the case of the normal integral, or as Urban calls it the ¢(y) hypo- 
thesis. They are given in the table just above, and 


S (a2) = 00455189. 


We now proceed to form the analogous quantity in the case of the arctan. hypothesis. 














| 
| Grams | Observed p Pp’ x = 
84 0233 0795 — 0562 “00315844 
| 88 0267 "1086 — 0819 “00670761 
| 92 “1167 "1682 — °0515 00265225 
96 3567 *3259 + 0308 00094864 
100 “6100 "6464 — 0364 “00132496 
104 8833 *8222 +0611 00373321 
108 “9300 8872 + °0428 00183184 
_ —_ — — "02035695 = S («*) 


























Urban now compares the ¢ (y) hypothesis with the arctan. hypothesis by comparing 
‘00455189 with 02035695, 


and deciding that as the former is smaller, therefore the ¢ (y) hypothesis is superior. 
This procedure is firstly inaccurate and secondly inadequate. It is inaccurate 
because not S(z*) but S(a*/p’q’) should be compared, and it is inadequate because 
no idea is given whether the observed difference is significant or not. 


The former point deserves a little more examination, because it is another form 
of an error which Urban was the first to correct, in this same article. In the form 
of the Constant Process as it left the hands of G. E. Miiller, certain weights are to 
be used on the observation equations. These weights may be called Miiller’s 
weights. Urban pointed out, however, that they needed amendment, and published 
(loc. cit.) a table of weights to replace them. These weights differ from Mii'!er’s by 
the factor 1/4pq, which arises in Urban’s treatment from an application of what he 
calls Bernoulli's Theorem. It is these very Bernoulli weights, 1/pg, which Urban 
himself has omitted in his above comparison of the ¢() and arctan. hypotheses. 


In order to discuss the inadequacy of his comparison we need a measure of the 
probable error of the quantity P used above. 
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(8) The Probable Error of x* and P. 





We have x? = pS (Pp mee y (from eqn. 18). 


If the accurate values of p’ were known, the variation of x? would be due entirely 
to the variations in the observéd values p. In point of fact, of course, the p’”s 
which are available are themselves functions of the p’s: but like Pearson in his 
1914 article on the probable error of a coefficient of contingency *, and for the same 
reason and with I think the same justification, we shall assume that the p’s do not 
vary. Then the mean square deviation 


: 2(p-p)? 4(p—-py 
ow +8 {a,*( = ) = S i 8 BY". . cvcce cece 19). 
a p Pq Vad Pq x (19) 
Therefore the probable error of y* calculated in the way suitable for the Constant 
Process and other processes for fitting psychometric functions is 


674502 = 1349 VF on. ccecsescsccsssecereeceeees (20). 


In the case abeve where y*=-19°56, its probable error is therefore about 5:9, so 
that we have 
x? = 19°6 + 5°9. 


We must next find x? and its probable error for the afctan. hypothesis. The 
calculations are partly carried out above in finding S(a*). Completing them we 
obtain the following table: 





a a*/p'q’ 





00670761 0693 
00265225 0189 
00094864 0043 
00132496 0058 
00373321 "0254 
00183184 0183 





0795 “9205 “07318 | Bonn 0431 
| 
| 
| 














— | = 1851=S(<%/p'q7) | 














2= pS = 55-58 
x = B Pd a °. 
Probablé error of x* = 1349 Vx? = 10-0. 


For arctan., x’? = 55°53 + 10°0. 

For ¢(y), x? = 19°6 + 59. 
Difference = 35°9 + 11°6, 

where 11°6 = /10°0? + 5-9". 


* Biometrika, Vol. x. 
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The difference is therefore three times its probable error and is just significant. 
The final conclusion is therefore that in this particular case the arctan. hypothesis 
is just significantly worse than the normal integral or ¢(y) hypothesis, but that the 
latter itself is very improbable. The P of the normal integral hypothesis it will be 
remembered was ‘007. The P of the arctan. hypothesis can be found from Table XII 
of Pearson’s Tables. The entry has to be made with n’ =7+1=8, and x? = 5555, 


and we find given P=°000000, i.e. it is less than 0000005, showing how very im- 
probable the arctan. hypothesis is. 


The probable error of P is discussed by Professor Pearson in the Phil. Mag. for 
April 1916 and he shows that the standard deviation 


Sp hey [Py — Bagg) oes coesisseicdectcenessces (21), 
and using equation (19) we get 
Op = (Py — Py—a) X IM OUT Case ....... ces eeseeeeece (22). 
It must be borne in mind that n’ =n +1, where n= number of variates. In our 


case therefore, n’ is one more than the number of stimuli. P,_, is similarly 
obtained from Table XII of Pearson’s Tables by entering with the column with 


heading one Jess than the number of stimuli. For the above $(y) hypothesis 
we have 


x? = 19°6, 
Number of stimuli = 7, 
P or P,= 007 approximately, 
P,=°002 ‘é 
op=(P,— P,) x = 005 V19'6 = 022, 
Probable error of P = 67450, =°015. 
Therefore for the ¢ (ry) hypothesis the criterion of goodness of fit is in this case 
P=-007 +015. 
It is most improbable, therefore, that P is at all large, and the fit is significantly 
a bad one. The probable error of P for the arctan. hypothesis is too minute to be 
found from the table. 
The calculations we have performed have been for Urban’s Subject IV (heavier 
answers). For his other data similar calculations can be carried out. The arctan. 
hypothesis is usually worse than the normal integral, but not always significantly 


worse, and the normal integral itself is an atrociously bad fit to the data in 
most cases. 


(9) Summary of Rules for Testing and Comparing Goodness of Fit 
of Psychometric Curves. 
Let there be n stimuli, and let 
Pr; Ps; Ps tee Pn 
be the theoretical frequencies at these stimuli, and 


Pi» Po, Ps +++ Pn 
the observed values. Let fr» Pas fis +++ fin 
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be the number of experiments at each stimulus. Calculate x’, the sum of the 
quantities 


sae if ak—g) 
Then in Table XII of Pearson’s Tables*, in the column n’=n+1 and the row x’ 
(interpolate if necessary) find the value of P, the probability of obtaining the 
observed p’s or a worse set, from the p’s by random sampling. 


The probable error of x? is here approximately 1:35 and the probable error of 
P is approximately 


6745 Picss = P,-1) X 
where P,,, is P itself, and P,_, is the value found in Table XII by using the 
(n — 1)th instead of the (n + 1)th column. 


APPENDIX. 


What value: will be obtained for x* if, in the example used above (normal 
integral hypothesis), we were to proceed by first forming a histogram and then 
treating this histogram as though it were an ordinary directly observed one, 
i.e. using equation (15) above? The cells of the histogram will be occupied by the 
quantities m = 5p x y (observation) or m’ = Sp’ x w (theory) where Sp is the change 
in p from one stimulus to the next and w the number of observations at each 
stimulus, here the same throughout. 
































| 
P | dp | p’ dp’ | = dp — op’ e*/u? e?/m'p 
| | | 
-0233 | ‘0088 ‘0145 00021025 | -0239 
0233 | 0088 
0034 “0350 0316 00099856 | -0285 
0267 | -0438 
0900 | ‘1051 0151 00022801 | -0022 
‘1167 | +1489 
2400 | ‘2055 "0345 00119025 | -0058 
*3567 | *3544 
2533 | ‘2611 ‘0078 -00006084 | -0002 | 
‘6100 ‘6155 | 
2733 | 2164 |  -0569 ‘00323741 | -0150 
‘8833 ‘8319 | | 
0467 | 1164 | 0697 00485809 | -0418 | 
| +9300 | 9483 
0700 | ‘0517 0183 00033489 | -0065 | 
| 





es -~ | oh aa ee Cae 1239 = $ (¢2/m'p) 

| 
whence y*= 300 x ‘1239 = 37:17, instead of the proper value 19°56. If the calcula- 
tion is performed in this inaccurate way, therefore (by analogy with data which are 
really in histogram form), a very wrong idea of the closeness of fit would be 
obtained. The reason, as stated above, is that the correlations between the cells of 
the histogram derived from an ogive with inaependently measured p’s are not such 
as to lead to equation (15). 


* Tables for Statisticians and Biometricians, Cambridge University Press, 1914. 




















ON CORRECTIONS FOR THE MOMENT-COEFFICIENTS OF 
LIMITED RANGE FREQUENCY DISTRIBUTIONS WHEN 
THERE ARE FINITE OR INFINITE ORDINATES AND 
ANY SLOPES AT THE TERMINALS OF THE RANGE. 


By ELEANOR PAIRMAN anp KARL PEARSON, F.RS. 


Part I. Non-Asymptotic Curves. 


(1) We have in recent practice found the importance of full corrections for the 
moment-coefficients in the case of singly and doubly curtailed blocks of frequency 
such as are indicated in the accompanying figure. It has not been adequately 

















Yo y, ¥, 
Yp 
be 8) teh h h h hah 
fe) A B 
Lp > 








recognised that even the mean of.such distributions is not correctly obtained by 
grouping at the midpoints of the subranges h, and merely finding the mean of 
these concentrated groups. Still less is this a correct process:in the case of the 
higher moment-coefficients. The practical statisticians, aware possibly of the exist- 
ence of “Sheppard’s corrections,” have been warned that they are only exact for the 
case of high contact, and regarding this have in their doubt neglected all corrections 
whatever. Now Sheppard’s corrections are still valid when there is no high con- 
tact, and they should therefore always be used, but they form only part of the full 
correction* and may indeed merely amount to some 50°/, of its value, although 
75°/, is a more usual average proportion, if the frequency block does not end in 
finite ordinates. We propose in the first part of this paper to deal with frequency 


* In certain cases although part of the full correction they are in the wrong sense, and therefore if 
used alone would be worse, than the raw moments, 
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blocks such as are indicated in the figure above, reserving for the second part the 
treatment of the corrections needful when the frequency curve asymptotes to the 
frequency axis, i.e. the cases of J- and U-shaped frequency distributions. The 
general treatment of non-asymptotic frequency blocks will follow the lines of 
pp. 282-8 of the paper : “ On the systematic Fitting of Curves,” contributed in 1902 
by one of the present writers to the first volume of this Journal. 


(2) The method there adopted started from the Euler-Maclaurin formula: 
he dZ’ 
~ 720 da® 
Fe) 2 A) 2 
30240 da® 1209600 dz’ © 47900160 da® 12! da™ 
where Z’ is any function of # and Z,’, Z,’, Z,,...Z,’ are the p+1 values of this 
function corresponding to p subranges taken from x =a, to «=a, of the range l. 
Clearly ph=l=a,—a). By, B,... are the higher Bernoulli numbers. The first 
term on the right involving the p+1 values of the function Z’ is the “chordal 
area”; the term between square brackets depends on the values of certain differential 
coefficients at the ends of the range, and these again depend on the form we assume 
for the frequency curve in the neighbourhood of the terminals. The value we are 


[? Z'da = (42 + 2 + Zi +... +2 pat bly) h—h [sb 


“ & 


wi), 


X 


going to take for Z’ is x*Z, where Z is the integral [” ydx, or, y being the frequency 
2 


ordinate, Z is the total frequency on the section x, — # of the range. In evaluating 
the limits we need not proceed beyond the ninth differential, for the 11th vanishes 
for s = 5 with our assumptions for Z, and in our experience of actual frequency the 
ninth term as a rule contributes very little to the total correction. In order to 
obtain our results we must assume some form for Z at the terminals of the fre- 
quency block. Clearly at «=«,, Z=N, i.e. the total frequency under consideration ; 
at «=a,,Z=0. We shall assume Z given by high order parabolae in the neigh- 
bourhood of the terminals, i.e. 





- a, (x = ay) Cy (a — %P as, (a — a) oF (a — ay)" ds (a — 7) 
Z=N(1+ 74 "i a ae a 


in the neighbourhood of # =a, and 


b, (t— 2) , dy (tp -- xP 


bs 
I! -& ie X, 3! 








hy? 4! hy 5! ke 





Za v( (yp — 2) | By (@p— 2) | ds (a ey) 


in the neighbourhood of x = ay. 


These lead at once to 
aZ AZ ‘i 
a = Na,fh,* and (a)... = N (-1)*b,fhy’............ (ii). 


Exactly as in the earlier memoir we shall determine the a’s and b’s from five 
frequencies adjacent to the terminals of the range. In many cases, however, 
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e.g. deaths in infancy, disease incidence in infancy, wages, incomes or house-valua- 
tions, we have details for the ends of the range on different subranges to those for 
the bulk of the curve. These modified subranges will be termed h, and hy, and 
when either of them is less than h, we shall get more accurate corrective terms if 
they are used instead of the frequencies on subrangesh. At the same time it must 
be remembered that in calculating the value of the chordal area terms, sufficient 
of these h, and h, subrange frequencies must be clubbed together to give sub- 
frequencies on ranges h. 


Let the frequencies on the first five subranges h, or h from «=a, be Nn,', Nn,’, 
Nn;’, Nn,, Nn;', so that n,’, n,n; n,/, ns’ are proportional frequencies, then 


a 
N(1—n,)= W (1+ + 4st git $4). 
1 _G , My , Wy  & , as; 
= wer el aT alee 
Similarly —n, —N = A 2 _ a 224 +3 23 + ic 24 + zi 25 
—n — Ne — Ns; = * /3 + = 3? + As 33+ As ¢ 3*+ = 3%, 


Thales safle 5 ! 


5 ce a. a a ds 
—2, — Nz —Ns — ny = 714+ tai Vtg V+ qi;"+ 51 


— ny — nf — ng — ny — Ns = 4 ;? 7 fed +3 5 + ii — + 58 58 5° 
Solving these equations we find 
= as tte — 163n,' + 137n, — 63n, + 12n,'}, ) 
{ 45n,’ — 109n,’ + 105n,' — 51n,' + 10n,;}, 


a;= a 17n/ — 54n.'+ 64n, —34ny + Trg}, \ ....ccceeeee (iii) 
a= { 3n'— 11n/+ 15n,'— 9n/+ 2n,'}, 
ds=— { n— 4n/+ 6n;'— 4n/ + ns}. 


wiesis He we find for the b coefficients 


ds {137n', — 163n’,_, + 137h'p_. — 68n'p_, + 12n’,_4}, 
b, = — dy { 45’, — 109n’,_, + 105n’,_, — 51n’p_, + 10n'p_,}, 
bs=+ £{ 17n',— 54n’,,+ 64n'p,—34n’ps+ Tn’pj,\ «....- (iv) 
b=-— { 38n’p— -Ln’p,it Ldn’p.— 9n'pst 2n'p4}, 
b= { mp— 4n’pit 6n'p.— 4n’'pst see 


where Nn',~,, Nn'p_s, Nn’p, Nn’pa, Nn’, are the five successive frequencies ad- 
jacent to the terminal 2 = a, of the subranges A, or h as the case may be. 


Biometrika xm 
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Since dZ/dx = — y, it follows that 


He on, L387 — 163n,' + 137n,' — 63n,/ + 12n,'}, | 


Nb, _ v “ty 


These results enable us to determine approximately the terminal ordinates of 
the frequency distributions given by sub-frequencies, and to discover how nearly the 
frequency curve comes to zero at the terminals of the range. Similarly the small- 
ness of the quantities a,, a3, a, a; and b,, b,, b,, 6; marks the character of the 
terminal contact. At the same time the reader must remember two points (i) that 
the terminal frequencies if small may be subject to large probable errors and 
(ii) that we have supposed y= 0, when « = a and «=p, the terminals of an integer 
number of subranges. It is extremely unlikely that the frequency curve would cut 
the variate axis exactly at such places. Hence on both counts, (i) and (ii), we must 
not anticipate in actual practice that a, and 6, will vanish at «=a and «=a, for 
non-abruptly terminating frequency, unless we know a priori the terminals of the 
range and have chosen our subranges to fit this knowledge. 





(3) The next stage in our work must be to table the values of 
dZ'/dx, dZ'/dax, ... PZ'/da®, 
where Z’ = Zz at the two terminals of the range. We may do this for s=0, 1, 2, 3, 4, 5. 
The theorem of Leibnitz provides the needful expansions which are 


dZ’ pit 











1 
-* aa, + eu Z, 
hel = of GS 4 Be Cees, Mena 
or ee OF + 5s ae OF 10s (8 — 1) a +22 = 2) a 0 OE 


Y/ 


+ 5s(s—1)(s- 2)(¢—8) a+ 9 +.9(8— 1)(s — 2)(s—3)(s— 4) 2—*Z, 





_- = 21s(s— 1yarFF ae 35s(s—1)(s— 2yar Oo 4 358(s—1) (s—2) (e—3) +97 
+ 21s (s— 1) (s—2) (8 8)(¢— 4) a+ S2 
+ 7s (s—1)(s—2)(s — 3) (¢— 4) (s— 5) eZ 
+ s(s—1)(s —2)(s—3)(s—4)(s—5)(s —6) a Z, 

d°Z’ i adZ 


> = 126s(s —1)(s—2) (s — 3) a*“* gs + 126s(s — 1)(s— 2)(s—3)(s— 4) a 


dat 
+ 84s (s — 1)(s — 2) (s —3) (s— 4) (s—5) we Bn wesw stoeweeres (vi). 
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Now all higher differentials of Z than the fifth vanish, and therefore we may 
cancel the first two terms of Ad and the first four of oe . The value of 3 
starts with the term 462s (s — 1) (s—2)(s —3) (8 — 4) (s — 5) a*-* —— =A = 7s and accordingly 


this and all terms beyond vanish for s= 0, 1, ... 5, or this my PBEE of Z’ is 
zero for our purposes. 








We have now to give s in succession the values 0 to 5, and subtract the result 
for the first from those for the second terminal : 


vos EE Chae FEP-C-a)x 


BZ" tp .. & GZ’ \*» [dZ’ |» .. 
|. ey (-K = - [ar |. = gai Ws 








h 
b; Lp a; XL b, _ ae ) 
h,} hp +E t8(h 5) . 


bs tp 4 se) 4 s(&- a) w, | Goer |= if q>2; 


~ Mg! is hot hy h,* h,! da 


és ee 











Pert Peers (+R) ™ 
[te [= (— (ata) +20 (se BSR) — 2 +B) ™ 
Lae [= 42 (- oe) N, [Goer |2=9 9>85 

vt: (I (aig ao 
BET o(-(tagead ;) +9 (bs 7 = ~a,5") - 18 (0, j2+a,7°) — 6), 
ae ln (- (058 + aj) +15 (3, e- a, is) 


Xp a) by Me 
_60 (5 fat 73) + 60 fe ))¥, 
a Z' |, b, Hy . Os &% b, “)) 
lar. =(- 126 (5+ 7 Rt 210(54— igt)) 


qd4anZ’ |= . 
[aor [9 28s 
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Pores | fee Ly* a 4. 
swt: [EY a(n ga ae) 


kale (- (6, Heat js) +12 (bo 9 i, - a, 75) 


—an Tp 
36 (6, 2 "+a, 5°) — 240,) W, 


kak i (> (6, f ey i) +20 (6, re me is) ~120 (6, 2 +0, is) 

+ 240 (52 ie - a,j) - 120 (4 r)) y, 
kas (- 252 (5.5% Bat Fs +840(bs 72 — a ) - 840(74+ B))¥. 
fel: = 3024 : (Fr + a) y, | Geer |. 0, g>4; 


[ d*Z’ |p Me fy Xo ay' a To" 
BET (fend) om aga 














- 00 (0 + 0,2) - 
60 (0, hp +a ho 602.) “ 
DZ" \2, ey a? x? &, x" 2,> is) 
Saar [a (— (i + Fe) +25 (6 fe — Fe) 200 (6,98 +07 
+ 600 (64 7% — a, 5) ~ 60 0(b,5 je +a 5) — 120) W, 
i X,* D) : 
[22 [= (sap en)s rE SE) 
4 b, S 
4200 (6, 7 wate *) +2520 (74 — 73), 
a Z? ze be 
EK [: = (— 15120 (6, 5% +a 73) + 15120 (rs - 7) 
dati Z’ | 2, ve 
| eer =0, g>4....(vii). 


z. 
(4) We have now to see the relation of the present integral | * Za dx to the 
XL 
moment-coefficients. Integrating by parts we have 


% gett % Xp yartl — Nat N 
Ze22dxz = | —— | ae pe oe 
f° ag Ee [c+ Rs bon s+1 rs ae 





where y's, is the (s + 1)th moment-coefficient about the arbitrary origin. Accord- 
ingly we have, changing s to s — 1, 














ELEANOR PAIRMAN AND KARL PEARSON 237 


Thus we can write 
8 8 
Me = 28 + W C1 - WV 


where C,_, is the “chordal area” term and Z,_, is the limit. term of the Euler- 
Maclaurin series, or if x, = 2, + uh, 
Ch =h {§Z,u° + Za + Zag + ... + 42,0," }, 
fe d (Za?) d? (Za!—) he = d®(Za*) 
na [wh a “W*-B *h a 
__ iW & (Ze) he d? (Za*) |%» 
1209600 = dz’ 47900160 da? 


L &—1)> 














Xo 


We now turn to the evaluation of the chordal areas. We can obtain these by 
remembering that if n, be the frequency on the gth subrange A, 


Zy =|" yde = Moti t+ Nope + +. + My. 
Thus it follows that : 
C.4=h 3S {hart + (a + hy + (a + Qh) +... + (ay + (u— 1) dh) } nu, 
if we note that Z, =0. 


But the series coefficient of n, can itself be summed by the Euler-Maclaurin 
Theorem, i.e. 


h {hae + (a+ ht ... + (a+ (u—1) hy} 





+uh ¢ —I 3 (8-1 
= [edt — Bh (ay + why + Ee uci — 
; a = 
ahs 
30sa0 da? ies 


= — th (a + uh + = (e + uh) +(s—1) A (a + uh} 


—(s—1)(s—2)(s—3) zi gh* (a + uh) 
+(s—1)(s—2)(s—3)(s—4)(s-- 5) sydqgh® (a + uh... 


he | 
m * a! —(8=1) pga? + (8-1) (8-2) (8-8) pfghtay— 


— (s—1)(s — 2) (s — 8) (8 — 4) (s —5) sgbqgh®ay'* + «... 


We are now in the position to find the value of sC,_, for the successive values 


of s. We have 


1 


1 z 
N (8C,1) == WV S {- th + % + uh — Lo} Ny 


8 (a, +(w— 4) h} my —% 


1 g ne 
N u=1 Bast (Re) By — My vcesiccccscae (x), 
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where »,’= moment about the origin of the subrange frequencies concentrated at 
their mid subrange points. Similarly 


Fy (80s)ana = -73, {— h(a» + uh) + (ar + ub)? + hh? — a2 — fh} ny 


=H 5 Meet (wd) B28} mu 


1 
WV (sC,_1)e=s 5 i 


Fe {(@+(u— —4h)hP — 4h? (a+ (u—$)h)— a8 — bho} my 


7 (80, -1)p-4 = 8 {— 2h (a, + wh) + (xy + uh) + h? (a, + uh) 
=1 


= ah = Bo" — h?a,2 + ah } Nu 


=| 


-75 {(a@ + (u— 4) h)§— fh? (a +(u— 4) hl + aht—a,t —h?a,?} ry 
Ig — Erg + ph! — tre — Pia oo. ecccceceececcecesceseeenes (xiii) ; 
1 I 2 ‘ 
V (80,_:)sas = W So {— $h (x + uh) + (a + wh) + Sh? (a, + uh) 


— tht (a + uh) — a — $h?ae + thie} ny 
Pp 
= 5B {at (u— 4) hy —§h (e+ (ug) A) 

+ Fgh! (a + (u —4)h)— a — Sh2as + hte} nu 
=vs — $v) + Zhen! — 2 — Sha? + tha, 
BS {— Bh (a, + uh)? + (ay + wh)’ + GR? (ay + uh)? 

— thi (a+ uh? + Ah? — a — Shia + ghia? — Ah} nw 


= Hy SB Moot (u— Bh — Be (a, + (u- 4) hy 


1 


| anaes rs a 
PV (8Cs-aens =N 


N, 
+ igh! (ay + (u— 4) hy — Ph — a — Shas + ghtae} nw 
- Shy! + Fgh, = gh —26— Shea + shia? 


(5) Wecan now put together the complete formulae for the corrected moment- 
coefficients about any origin from the values we have obtained for the component 
parts of (ix), but we may first simplify our notation slightly so as to abbreviate 
somewhat the lengthy resulting expressions. We write ppy=h/hp, po =h/hy; these 
will very frequently be unity. Next we put b,’=6,p,', a,’=a,p,°, and a,/h = ay’, 
x/h = x; thus for terminal units the same as for the bulk of the frequency we 
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should have 6,’ = b,, a, =a,, and for working units where A generally = 1, Ly = Ly, 
X,=%. We find 


pa = vy +h {ay (ay — ys + gebsy ts) + py (br — agbs + aptzgs’)} cece ecco eeeeee (xvi). 
“us! = vo ie. Pgh? eu h2 {— th (a, as rhe) 4 vhs (b,/ as rich’) 
+ fay (a) — yds’ + xpgy 4s) + hap (bi — Pods’ + apaghs)} ---eceeeeeeeees (xvii). 


bs = vs — EhPv’ +h? {— By (ay’ — gy as' + zh5 Gs) — dy (hi — Fybs’ + zhybs) 

— p5% (a2' — rhe’) + py Mp’ (de! — rhe bs’) + 4.20? (ay' — gyas' + zey%s ) 

te Ra GS = Bey & eles I ons. cn ceccscsucssccsccdtereestaveseccctensetees (xviii). 
pg = V4 — ghPv ys + ghgh + ht rhe (ae! — yas’) — rhe (by — deby’) 

— Poe (ch’ — Py Os' + ghy de) — Po Mp’ (b — Fybs’ + ahybs') 

— to%o” (da! — hg Ga’) + Py ty? (be! — tha bs) + $40° ar’ — gods’ + gezo 4) 

Ee Oy as Pe x vntccecsessencesesvispuescseastadeceoseapeeal (xix). 
bs = V5 — Rh vy + gghtvy + h® {aby (ay — gods’ + ahy ds) + ahs (bY — Bobs’ + zine’) 

+ he’ (A2' — Fy as’) — rhe tp’ (be — Fyb,’) — 4a," (ay — Peas + zhyas) 

— fay” (by — gigbs’ + xhebs’) — fy te (ae’ — zoe 0s') + Py ep* (dy — 75 ,') 

+ Pyay* (ay’ — dy as’ + apg Gs) + Peep! (dy — gods’ + apg bs)} ..--eeeeeeee (xx). 
Me =e — Sh? vy, + hiv! — igh’ + ho [— dy (a’ — fyay’) + ch (D: — Fy’) 

+ fy @o' (dy! — Gps! + gp as) + Peep (bi — Fo bs' + ake bs’) + Fea” (ae — Fay’) 

— fy” (ba! — Fybi) — hare" (ay — Pgas + zho 4s) — dary (by — gud,’ + thybs) 

— $a (ae — pegs’) + $Xp'* (be — z¥eb,') + 420° (ay — gods + apn Os) 

ad ik Meet Meat, ee ee (xxi). 

The first series of terms outside the curled brackets are precisely the Sheppard’s 

corrections for the moments which accordingly still remain essential portions of the 
corrective terms even when there are final terminal ordinates and any degree of 


abruptness in the slopes at the end of the range. We may speak of the a’s and b’s 
as the “abruptness coefficients.” They are determined by equations (iii) and (iv). 


It will be clear that the terms in the curled brackets repeat themselves, so that 
in working as we usually do to the fourth moment we have to deal only with eight 
functions of the abruptness coefficients. 


The next stage is to consider how equations (xvi)—(xxi) may be most ad- 
vantageously arranged for practical statistical werk. In all such work the subrange 
h is taken as unity. Hence we may always write it 1. Further, the origin is at 
our choice, and it might seem desirable to take it at the mean. But there are two 
means, namely the true mean y, of the data and the mean »,’ of the concentrated 
groups ; with abruptness these are no longer identical. If we take moments about 
the true mean, »,’ is not-zero and our calculations are not simplified by its vanishing. 
On the other hand if we take moments about the mean of the concentrated groups 
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we shall then have to transfer the y’s to their true mean: Nothing is therefore 
gained practically by the process. Besides this, neither mean is a good working 
origin, and if we take such to calculate the moments in the first place we may have 
two transfers to the means, one for the v’s and one for the yw’s. Further 2, and a, 
will both have to be calculated and all the terms used. We can, however, get rid 
of slightly less than half the corrective terms, if we take moments about one end of 
the range* and then transfer the y’s to their mean. This appears to us in practice 
to be the best policy, for, although it involves taking the differences of large numbers, 
it is quite easy with modern mechanical calculators to retain the requisite number 
of figures for accurate results. Accordingly we will rewrite our formulae with these 
changes, remembering that «, is now the range / = ph = p. 





pa’ = 1) + {py (Gi! — pgs’ + gery as’) + py (di — pods’ + aptzgds )} enc eee eeeeeeeeees (xxii). 
Ha! = Va — py + [— zh (Ge! — hq’) + tho (0. — thes’) 
TU, S6 ie I TF i. ccnsedesxecinvuntuieners<ank muysveattvccstee (xxiii). 
by = v3 — fn, + {- By (ai — Fy as’ + rty%s ) — py (by — gybs' + ghy)s) 
+ ppp (be — pha bs’) + Ap? (di! — pods’ + apbrgds’)} cece eeeeeeeeeceeeeeee es (xxiv). | 
Bd = V4 — $00! + hy + bebe (Gs — Fo’) — the (be — debs’) — pup (by — ays’ + zh5bs) 
+ gh p? (ds! — haba’) + hp? (bY — Ayds’ + aphrg ds )} ces eceececeeceeseeeeeees (xxv). 
fs = V5 — Rvs + fers + [aby (Gi — Fyas + zhyds') + aby (bi — Fybs' + zhabs') 


— hap (0. — dybs’) — tp* (b) — pybs’ + atybs’) + gop? (be — h55,') 

ry as fH mace te Bh so sscnvssecesepicpttcncresebsreapesscstieratecd’ (xxvi). 
fe = Ve — Sud + qv — ahh + [— dy (Ge — Peay’) + dy (be — Fy by’) 

+ Pop (by — fobs + rhode’) — Fyp? (be — Fabs’) — bp* (by — dys’ + zhyb') 

+ $p* (b. 7. zea b,’) + $p* (b,' = ays’ + x09 0s )} ee ecccccccescccseccccess (xxvii). 


(6) We now propose to illustrate the degree of exactness with which it is 
possible to obtain the moment-coefficients of curves with marked degrees of abrupt- 
ness, and further to investigate in practice the extent to which small terminal range 
elements may be of advantage. We will commence with some mathematical 
frequency distributions for which it is possible to calculate the exact values of the 
moment-coefficients. 


Illustration I. Moment-coefficients of the common parabola y = Vz x 100,000 
from «=0 to 10. This is a good case for a test, for the curve rises vertically at 
£, = 0, and therefore, theoretically, our equations fail. At 2, = 10, we have a finite 
ordinate and fimite abruptness coefficients. We are hardly likely to get a case 
wherein the abruptness causes greater changes in the grouped frequency moments, 
or to which it is less possible a priori to apply merely Sheppard’s corrections. 


* The distances of the successive concentrated groups are th, $h, $h,.... In taking moments it is 
convenient to use 1, 3, 5, etc. and then before substitution multiply »,’ by (-5)*. 
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We divide the space from «= 0 to x= 10 into ten subranges giving the following 
system of “ frequencies ” : 


Absolute Proportional 
frequencies frequencies 
ny 66,667 nm’ =031,623 
Ng 121,895 Ne =057,820 
Nz 157,848 mg ='074,874 
M4 186,923 m4 = 088,665 
Ns 212,023 mz, = 100,571 
Np—4 234,440 n' »-4=°111,205 
Nps 254,888 2 »-3 = *120,904 
Np—2 273,811 n' 5-2» = 129,880 
Np—1 291,505 n'y = 138,273 
Np 308,185 n'y, ='146,185 
Total frequency = 2,108,185 1:000,000 


These lead to 


Abruptness Differentials and Ordinate 


Calculated Actual 
a, = — °0131,0643 Y = °27630°78 0 
Ag= — °0444,8167 Yo = +9377559 2) 
a3= °0258,4150 Yo = — 5447866 0) 
a4= — °0148,8400 Yo” = +31378°23 oe) 
a,;= °0045,0200 yo" = — 9491°05 © 


Now it is clear that the ordinate of our auxiliary curve is not zero, but it looks 
larger than it really is relative to the ordinate at the other terminal which is 
316227:77 so that the ratio is only ‘087, or if the curve be actually drawn to any 
reasonable scale, the ordinate of the auxiliary curve at the vertex which is less than 
one-tenth of that at the other terminal, looks relatively small. We may also com- 
pare it with the ordinate of the actual curve at 2 = 1 which is 100,000, or between 
three and four times as great. Similarly the abruptness differentials are not infinite, 
but their values in the actual curve are very considerable at «=1 and are then: 
y; = 50,000, y,” = — 25,000, y,"” = 37,500 and y,"*=— 93,750. Thus the first two 
for the auxiliary curve are about double, the third of the same order, but the fourth 
is much less. Clearly all this is a result of the fact that we cannot expand Vz in 
a series of integer powers of x, and this is one of the reasons why we selected it. 
We want to determine whether the formulae give a very bad result for the moments 
even in the case of extreme abruptness. Accordingly our real test lies in the values 
of the deduced moment-coefficients and not in those of the abruptness differentials, 


We pass now to the b’s and find: 


Calculated Actual 
b,= °1499,9857 Yp = 316,224°74 316,227°77 
hy = — ‘0074,9283 Yp =— 15,796°27 — 15,811°39 
6, = — °0003,9450 Yp = 831°68 = 790°57 
b= — °0000,2600 yp =- 54°81 - 118°59 
b;= — 0000,3800 Yi’ = — 80°11 ~ 29°65 


Here again the values of the terminal ordinate and the abruptness coefficients, 
although good in the former case, are only approximations and the real test must 
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depend on the moment-coefficients. If we go as far as yu,’ we have to calculate 
the following eight expressions : 


ah’ — Pods’ + xpgy Gs = — (0131,0643 — 0004,3069 + :0000,0179 = — -0135,3533, 
Ay’ — z$,a, = — 0444,8167 + 0005,9063 = — -0438,9104, 

ay’ — #5 ds + gigas = — 0131,0643 — -0020,5092 + -0000,1876 = — 0151,3859, 
Ae’ — gga, = — 0444,8167 + °0013,0235 = — 0431,7932. 


The quantities we require are 
ds (ay — gods’ + app Gs) = — 0011,2794, zhy (de — zh,4,’) = — 0003,6576, 
dy (ay' — Pgs’ + ghyas) = — 0003,7846, the (a2 — a,’) = — '0003,4269. 
It will be seen from these results that a,’ does not contribute very much and 
a, still less to the final corrections. We now take the b’s and find 
(by' — pads’ + gee ds’) = *1500,0513, (b,' — +4,5,’) = — :0074,9273, 
(by — gybs’ + xhy),') = °1500,2972, (b. — 5b) = — 0074,9055. 
Whence we deduce for the abruptness functions, since p= 10: 
pa (b,' — py ds’ + aptay ds’) = '0125,0043, kp (by — gobs’ + xpoqbs') = 2500,0860, 
tp? (by — pybs’ + geeg be) = 3°7501,2825, 3 p* (by — gobs’ + apay),') = 50°0017,1000, 
rh (be — r$5b/) = — 0000,6244, Ap (br’ — z$,,’) = — -0001,8731, 
oP" (bs — hyd.) =— 0374,6865, dy (by — pbs’ + y4yb,’) = 0037,5074, 
PoP (b: — Pz bs' + zhyb,’) =*1500,2972, rhe (be — Hb,’) = — 0000,5945. 
We now give the values of the grouped moment-coefficients about the origin. 
Alongside them we place their values as corrected by Sheppard’s terms. We then 


give the values as found by full correction formulae and lastly the actual values as 
deduced by integrating the parabola. 


Values with 
Sheppard’s Values with full 
Raw moments corrections corrections Actual values 
vy 59880 vy 5:9880 py 59994 6:0000 
vg 42°6900 re — py 42°6067 po’ 42°8570 42°8571 
v3. 331°0854 v3 —fv, 329°5884 Bs. 333°3349 833°3333 
vg 2698°7735 v4 —$vo +345 2677°4576 pa 2727°2757 2727°2729 


Tt will be seen that the fully corrected results are in most excellent accord with 
the actual values. Sheppard’s corrections, although component parts of the general 
corrections, move if taken alone in the wrong direction, i.e. they lower moments, all 
of which need to be raised. Thus while Sheppard’s correction lowers the fourth 
moment by about.21, our new corrections raise it by about 50, the result being the 
requisite raising by 29. 

It seems to us unlikely that a more unfavourable case for our abruptness 
coefficients could be found. It certainly emphasises the point that to obtain very 


* The a’’s and the b’’s will in this case be equal to the a’s and b’s. 
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good results it is quite unnecessary for the terminal ordinates and differential 
coefficients of the actual curve of frequency and the auxiliary terminal curve to be 
closely ‘identical. 


(7) Illustration IT. Now let us take a normal curve containing 1,000,000 
individuals with a standard deviation of unity, and let us suppose the frequency 
grouped on 0°5 x standard deviation subranges, the first such subrange being central. 
Then, adjusting to units, we have the following system : 


— ‘25—+ ‘25 197,414 
+ ‘25—+ °75 174,666 
+ ‘75—+1:25 120,977 
+ 125— + 1°75 65,591 
+ 1°75— + 2°25 27,834 
+ 2°25— + 2°75 9,245 
+ 2°75— + 3°25 2,402 
+ 3°25— + 3°75 489 
+ 3°75— + 4°25 78 
+ 4°25— + 4°75 10 
+ 4°75— + 5°25 1 


To test the error introduced by our adjustments, take second moments for the 
complete curve about the centre of the group from —‘25 to +'25. We have 


v, = 0, vo = 4°083,394. 
Using Sheppard’s correction as abruptness coefficients are zero, we have 
#2 = 4000,061 in working units, 
= 1:000,0152 in actual units. 
Accordingly « = 1:000,008, which is a quite good approximation to unity. The 
error introduced by our adjustments for omitted decimals is therefore not great. 
(a) We will start first with the singly truncated normal curve given below 
and h,=h, i.e. the area from # = 1'25 onwards, 


Moments about 


Frequencies stump 
65,591 vy = 1:029,513 
27,834 vs = 1°693,994 
9,245 vy = 3°883,416 
~— Total 105,650, Fg a IPO OF7 
78 (in working units) 
10 
1 


and determine its moment-coefficients, as a frequency curve having high contact at 
one end and marked abruptness at the other. In this case all the b’s are zero and 
we only need to find the a’s. Clearly a’ =a. 
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Using (iii) we determine from 
nm,’ = °6208,3294 the values: a, =— ‘878,708, 


Ny = '2634,5480 a = 607,984, 
ns = °0875,0592 ds = — 296,843, 
n, = °0227,3545 a/= ‘081,723, 
ns = '0046,2849 ads= 005,736. 


Whence 
Oy’ — gy ds + app Os = — 873,758, Ay — zh5a, = + 604,741, 
Oh’ — 50s + ghyds = — 855,125, ay — gya, = + 600,833. 
The corrections due to these abruptness coefficients by (xxii)—(xxv) are for v,’, v¢’> 
v; and vy,’ respectively 
— 072,813, —-005,040, +°021,378 and + 004,769, 
while the corresponding Sheppard’s corrections are 
0, — 083,333, —-257,378 and — ‘817,830. 
Thus we deduce for moment-coefficients about stump: 
With Sheppard’s 


Raw moment corrections Full corrections 
vy 1-029,513 1:029,513 956,700 
ve 1°693,994 1°610,661 1°605,621 
v3. 3°883,416 3°626,038 3°647,416 
vq 10°974,937 10°157,107 10:161,876 


These values are in working units =‘5 actual units. Hence in actual units 
we have 
With Sheppard’s With fall 


Raw moments corrections corrections True values 
vy 514,756 vy 514,756 pa 478,350 ‘478,8131 
v9 *423,498 ve — py 402,665 pe’ 401,405 *401,4837 
v3. 485,427 vg — fv,’ “453,255 Bs. *455,927 *455,7714 
vq 685,934 vg - bre + sis 634,819 pa *635,1 17 *634,7360 


It will be seen from these results that our full correction values for the moments 
about the stump are in every case accurate to 1 in the 1000, while, if Sheppard’s 
corrections only are made, we may be out nearly 1 in the 100. The change in the 
mean, second and third moment-coefficients is very noteworthy. In the case of the 
fourth moment we are out 0004 in ‘6347, while the Sheppard’s correction alone is 
out only ‘0001 in 6347. The cause of this irregularity we have not been able to 
detect, although we have examined carefully the whole of our arithmetic. It 
seemed accordingly worth while inquiring what differences would occur when the 
moment-coefficients were taken about the mean and not about the stump. 


With 


Raw moments Sheppard’s With fall 
about mean corrections corrections True values 
Vo "158,524 ve— x (°5)? *137,691 1) *172,587 "172,222 
V3 "104,226 v3 104,226 Ps 098,801 098,612 


V4 “149,090 Ty to $y2(°5)?+ 345 ('5)! 131,097 B4 156,767 "156,405 
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It will be seen that now Sheppard’s corrections are wholly inadequate and our 
corrections are essential, even in the case of the fourth moment-coefficient. This 
confirms the view of Sheppard himself, who insisted on the importance of high 
contact at the terminals, if they are to be used alone. It is a convincing illustration 
of the fallacy of those “ proofs” of Sheppard’s corrections which do not appeal to 
the principle of high terminal contact. 

We now propose to illustrate the degree of improvement in the exactness 
obtained, if we calculate the abruptness coefficients on smaller subranges. Accord- 


ingly we break up the terminal group 65591 on ‘5c base into five groups each on 
‘lo base. These are 


17142 leading to n, =‘1622,5272 and a, =—-°1728,9281, 


14979 Ny = 1417,7946 a,= °‘0216,4780, 
12959 Nz = °1226,5973 a; = — °0010,4603, 
11099 ng ='1050,5442 a,=— 0002,3651, 
9412 Ns =°0890,8661 a;= °0000,3781, 
whence, remembering a, = (z) a = 5%a,, we find 
a,’ = — 864,464, and +, (a’ — ghas + ae/sp4s ) = — 071,853, 
a, = ‘541,195, — zh (@e’ — 73’) = — 004,559, 
as = — °130,754, — dy (a! — 50s + zhp0s) = 021,351, 
a,’ = — 147,818, oie (a, ~ dec) = 004,398, 
a5 = ‘118,156. 


Thus the moment-coefficients become 
ww, = °957,660 or in actual units -478,830, 


pe = 1°606,102 401,526, 

Bs = 3°647,389 “455,924, 

pas = 10°161,505 635,094. 

Transferring to the mean we have 
On ‘5 subranges On °1 subranges Actual values 

pa’ -478,350 478,830 478,813 
me ‘172,587 172,248 172,222 
pe 098,801 098,705 098,612 
ie ‘156,767 156,516 156,405 


While the first column of values would be amply adequate for most statistical 
purposes, the second makes a still closer approximation to the actual values, the 
differences being only 

— 000,017, +°000,026, + 000,093, + 000,111 
as against — 000,463, +°000,865, +°000,189, +-°000,362 


respectively. The greatest improvements are in the mean and standard deviation. 
Accordingly it is well worth using smaller terminal subranges, if they are available 


as in the cases of cricket scores, wages, house values, infant mortality and other 
frequency material. 
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(8) Illustration II. (b) We now propose to consider the moment-coefficients 
of a doubly truncated normal curve. We will take the portion of the above 1,000,000 
distribution with unit standard deviation from variate value 1:25 to variate value 
3°75 and divide it into five groups, i.e. 





Absolute Relative 
frequencies frequencies 
65,591 *6213,5637 
27,834 *2636,7693 

9,245 0875,7969 

2,402 0227,5462 

____ 489 *0046,3239 
Total 105,561 Total 1-0000,0000 





Using (iii) and (iv) which now involve all five groups we find 


a, = — 8794,4917, b, = — °0038,5527, 
a,= °6084,9651, b,= °0258,2393, 
a; = — ‘2970,9362, b, = — 0401,0477, 
a,= °0817,9157, b,= °0530,8779, 
a; = — ‘0057,4076, b;= °0057,4076. 


From these results, since a’s = a’’s and b’s = b’’s we have for the abruptness functions : 


a,’ = aya. + apap ds. = °8744,9989, b,’ ssad ays’ + seas = 0031 8458, 


a! = as = 6052,5081, _b, — 5, = -0237,1727, 
Oh’ — Pes) + ghyte’ =—°8558,9423, by — by + ghgds’ =— 0006,4843, 
a’ + La’ = ‘60133975, by — dyby = 0211,7875 


About the first terminal we have for the raw moment-coefficients 
v; = 1:025,630, vo = 1°668,535, —v, = 3°7338,743, —_»,/ = 10°108,966, 
and by (xxii)—(xxv) the corresponding corrective terms are 
— ‘0731,4037, —-0074,9994, + °0044,7459, -—-0981,1557, 
leading to the Sheppard’s correction moment-coefficients in actual units : 
fy = 512,815, a, = 396,300, ws’ = 434,667, ws,’ = 581,492, 
and the full correction moment-coefficients : 
pn’ = "476,245, us = 394,425, pr,’ = 435,226, pu,’ = “575,360. 
We now transfer to the mean of the block and find 
pa’ = 476,245, yp, = 167,616, ps =°087,7305, yu, =*128,691, 
while the values for the Sheppard’s corrections only would be 


fay’ = 512,815, =p. = °133,321, pw, = 104,701, py = °107,714. 





The theoretical values for the normal curve block are 
fy’ = ‘476,930, py. = ‘168,025, py, = 089,730, fy = 133,748. 
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It will be seen that the Sheppard’s corrections alone give very unsatisfactory 
results, and that while the full corrections for the first three moments are statistically 
satisfactory, the approximation of the fourth moment-coefficient would not for 
certain investigations be adequate. We are in fact using only five groups and 
trusting to these for the accuracy of our abruptness coefficients. We will accordingly 
now test what improvement arises when we divide our terminal groups into five sub- 
groups and calculate the abruptness coefficients on these smaller subranges. Thus 
we have h = ‘5, h, =h, ="1, and therefore p= p,=5. Our subgroups are: 


n, =17142 therefore n, = 1623,8952, Np-s= 173 therefore n‘,,='0016,3886, 

ny, = 14979 Ne = 1418,9900, Np_s = 124 Nps = 0011,7468, 

ng = 12959 ny ="1227,6814, m= 88 n'p-2 = 0008,3364, 

n, = 11099 ny ='1051,4300, Nypi= 61 n'»-, = 0005,7786, 

n= 9412 n’ =-0891,6172, mn, = 43 n'y» =-0004,0735. 
65591 489 

Whence 


a, = —1730,3848 and a,’ = —°8651,9242, b,= ‘00035810 and b,'/= -0017,9049, 
a, = + °0216,6599 ay = + 5416,4969, 6,= -0000,5368  b/= °0013,4204, 
a; = —°0010,4679 a; =—*1308,4851, 6,= ‘0001,5157 6b,= 0189,4639, 
a, = — ‘0002,3683 aj =—'1480,1868, 6,=—°0000,7579 = b,’ = — 0473,6598, 
a = + ‘0000,3789 a; = +°1184,1494, 6,= -0000,3789 b= -1184,1494. 
Determining the abruptness functions from these values, we have 


a,’ bows oa ay. + apap ts. —— *8629.6462, b,’ > pbs’ + zesobs =+ °0015,2171, 


Oe! — $504," = -5475,9345, by — zB,’ = + 0032,2164, 
a, — es ay, oa aigds => *8543,1522, b,’ a: wx b,’ + atybs’ => + ‘0007,8021, 
an! — Fra = 5604,7508, 6,’ —gyb/ = + 00548656. 


Working out the corrective terms for abruptness we find them 
— 071,787,  — 003,368, +°031,252, +°071,446 
in working units, leading to 
py’ = 476,922, ue = "395,458, pas’ = °438,5735, yw,’ = 585,957, 

or transferring to the mean we have 

Hy = "476,922, uo = "168,008, ps = 089,722, a, = '133°783, 
as against the actual values 

Hy ='476,930, wo = 168,025, ys;='089,730, a, = "133,743, 
an eminently satisfactory agreement. It is thus clear that when possible it is 
desirable to obtain the abruptness corrections by small subranges—in this case ;4, of 


the standard deviation. Hence any terminal small range groupings such as are 
frequently provided in statistical data are useful from this standpoint. In fact if 
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the abruptness coefficients are found from such small groupings the remaining sub- 
ranges can safely be made fairly coarse, as in the above examples, where five 
divisions of the total range are clearly adequate. 


(9) Illustration III. Mean Age and Variability of Infants at Death. It is 
very important in practical statistics to obtain the mean and standard deviation of 
J-shaped curves. A good illustration of such curves may be found in infantile 
mortality statistics. These have the advantage that in the early part of the year 
of infancy the frequencies are in certain cases given by much smaller intervals. 
Thus in the Prussian official statistics they are given for the first fortnight by days. 
Professor Raymond Pearl in a paper of 1906 (Biometrika, Vol. Iv, p. 510) has 
endeavoured to ascertain the mean age at death of infants in the first year of life 
from the Prussian data. It will be of interest to determine what changes are likely 
to be made in his results by the use of our present abruptness corrections. He 
writes (p. 512): 

It is evident that the grouping here [i.e. in the Prussian data] is sufficiently fine to make 
possible a very accurate determination of the mean age of death.... .A standard month of 
30 days was assumed : then with a unit of 30 days the first and second moment-coefficients about 
an arbitrary axis were determined. From these the position of the mean and the value of the 
second moment about it were easily found. Only the “rough” second moment was calculated, as 
it was deemed sufficiently accurate for present purposes, and furthermore it was difficult to deter- 
mine the proper corrective terms to apply in this case. In the calculations each frequency 
element was for practical convenience centred at the midpoint of its range. The error made by 
so doing is negligible. 

With our present corrections we can test how far the errors made by concentra- 
tion at the midpoints of the subranges are really negligible. It is certainly right 
to concentrate at those points provided we allow for terminal abruptness which is 
very marked in this case. If we make the proper terminal corrections theory shows 
that quite considerable subranges, say in this case one month, may be used to 
determine the raw moments.- It will be sufficient to illustrate the method on the 
Prussian male infant deaths. 


We have deaths per 1000 infants born: For the birth terminal we have*: 


Months Deaths Days Deaths 
o—1 63°99 0—3 18°25 
1—2 22°59 3--6 6°58 
2—3 18°58 6—9 7°89 
3—4 15°96 9-—12 5°65 
4—5 13°30 12—15 5°82 
5—6 11°51 
6—7 10°61 
7—8 9°30 
8—9 87. 

9—10 8:29 
10—11 751 
11—12 6°94 

Total 197:32 


vi! = 3°759,224 
vy! = 25°809,801 


* Three day intervals taken with a view to smoothing anomalous values. 


in months. 
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The month subranges will be quite adequate at the childhood terminal. 


As the results are based on 1877—1881 averages, we shall suppose the month 
to be 30-4375 days. Thus h/h,=10°145,833, h/hp=1. We find 


nm, = '0924,8936, a, = — 1877,2637, a, =— 1:904,640, 
ny’ = °0333,4685, ad,= °2966,9657, a, = 30°541,331, 
n; = °0399,8581, Gs = — ‘3909,0057, as =—  408°253,057, 
n, = 0286,3369, a= °3117,2715, a, = 3303°128,589, 
ns = '0294,9524, as = — ‘1139,7730, as = — 12253°408,881 ; 

Np, = 0471,3156, b/=b,= 035,759, 

Nps = '0442,9353, b,’ = b, = — 004,823, 

W p_» = 0420,1297, b =b,= °013,861, 

'p—. = 0380,6000, b. = b, = — 012,670, 

np, =°0351,7130, b =b,= 004,967. 


From these we deduce 
tr (ay — 54s’ + aey54s)= 003,093, tz (by — gy bs’ + zeyqbs') =. 002,961, 


tho (ae — $5.4’) _ =— 837,793, hy (bs — heb’) * =— 000,036, 
whence Total abruptness correction on »,' = 006,054, 
. . ve = "843,679. 
Thus py’ = 3°765,278 months, Me = 26°570,147 (months), 


using of course Sheppard’s correction. 


Finally we reach 
Mean = 11461 days as against* 113-07 days, 
Standard Deviation = 10715 me eo 105°44 ,, 


obtained from taking the raw moments of small elements of one day up to the end 
of the first fortnight. Thus, if we desire to get a mean within 1'5°/, of the correct 
value, it will be well to adopt abruptness corrections. 


(10) Illustration IV. In view of the fact that in the previous illustration the 
infantile death-rate curve has probably an infinite initial ordinate it seems well to 
measure, in a case which can be tested, the degree with which our corrections give 
the actual values of the moment-coefficients in such a case. 


We choose the.curve y= har 4 
and suppose ten subranges going up to the terminal = 10, from «=0, 


* Pearl’s results modified by taking the average month to be 30-4375, not 30 days. 
Biometrika x1 
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We have for the “ frequencies” : 
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Further, for small terminal 


Total 3:162,2777 


x Frequency subranges, we have : 
0 tol 1:000,0000 z Frequency 
: to : weune 0 to 2 -447,2136 
to - " Lhe 2 to “4 *185,2419 
3 to 4 267,94 4 to 6 "142,1412 
4to5 ‘236,0680 6 to ‘8 -119,8305 
5 to 6 ‘213,4217 8 to “10 *105,5728 
6 to7 ‘196,2616 
7 to8 ‘182,6758 
8 to 9 ‘171,5729 
9 to 10 ‘162,277 


It will be sufficient to take the subranges unity at the other terminal, or h/h, =5 


h/h,=1. Thus we have 


pv, = 3'394,907, 


n; ='1414,2136, 
no = *'0585,7863, 
ns = °0449,4899, 
n, = '0378,93738, 
Ns; ='0333,8505, 


n' p_4 = '0674,8987, 
n'y-3 = 0620,6337, 
np» = ‘0577,6716, 
n'y = '0542,5612, 
= (0513,1672, 


, 
Np 


a, = — ‘2332,9561, 
dy = + ‘2583,1707, 
ds = — ‘2657,4026, 
a, = + '1798,6052, 
as = — ‘05$6,1091, 


b, = b, = + 050,0105 
b,’ = b, = + ‘002,4535 
b,’ = b, = + 000,4817 
b,’ = b, = — :000,0497 
b,’ = 6, = + 000,1316 


v, = 20°016,109, 


a,;=— 1:166,4781, 
a, =+  6°457,9263, 
a; =— 33'217,5325, 
a, = + 112°412,8250, 
a,’ = — 183°'159,0938. 

Actual values 

+ '0500,0000, 

+ 0025,0000, 

+ 0008,7500, 

+ 0000,9375, 

+ 0000,3281. 


These values of a’s and b’s lead to the abruptness functions : 


as (a,’ = ards + apa %s ) =— 057,1279, 


ty (b,' ii ay Ds 7 rer5 9s ) oss 


Accordingly we find 


pn’ = 3°394,9066 — - 


and 


which gives 


For comparison we have 


004,1669, 


Jy = 8°830,8953. 


052,9616 = 3°341,9450 
Jus’ = 20:016,1090 — 083,3333 + 066,7155 = 19°999,4912, 


— thy (a — z$ga,’) = — 016,6425, 
tho (by 


— +356) = -000,0205. 





Using only 
Sheppard’s 
Raw moments corrections Full corrections True values 
py 3°3949 3°3949 3°3419 3°3333 
pe’ 20°0161 19-9328 19°9995 20-0000 
pe 8°4907 8°4074 8°8309 8°8889 
o 2°9139 2°8995 2°9718 2°9814 


It will be seen that Sheppard’s corrections alone are worse than the raw moment 
results. In other words they should certainly not be used alone for J-shaped curves; 
it would be better to take the raw moment results without any corrections. On the 
other hand the full corrections even in this extreme case—where (a) the Euler- 
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Maclaurin Theorem fails theoretically, (6) our auxiliary curve is unreasonable, for 
«~* cannot be expanded at the origin terminal in powers of z—are found to give 
results within }°/, of the true values for both mean and standard deviation. 

The variety of illustrations we have taken seems to suggest that for most 
practical statistical problems—even with J- or U-shaped distributions—we shall 
obtain reasonable results from the system developed in the first part of this paper. 
At the same time the method adopted indicates that for the best possible results in 
asymptotic frequency curves it may be needful to use a more suitable auxiliary curve 
for the asymptotic terminal. This leads us directly to the second part of our paper. 


Part II. Cases of Asymptotic Frequency. 

(11) In selecting our auxiliary curve to give the first five frequencies we must 
remember that it has (i) to give an infinite ordinate but a finite frequency, (ii) it 
must be of such a character that its constants can be readily determined. 

If we adopt 

Z=N(1+29(A + Br + Cx? + Da? + Ext)), 
where q is chosen less than unity, we have the adequate number of constants and 
= — dZ/dz is infinite when « = 0. 

If we leave g undetermined, however, we should have six not five constants and 
might then omit #. But the process of determining A, B, C, D and q would be 
very laborious and involve a troublesome series of approximations. We are ac- 
cordingly thrown back on the retention of H and an arbitrary choice of g. Olearly 
to give an infinite ordinate and. finite area we may give g any value from slightly 
over zero to slightly under unity, and the size of g measures so to speak the intensity 
of the asymptoting. This is probably rather an important feature of the frequency 
curve, but as we see no way of determining it accurately without very great labour, 
we give g its mean value $. Accordingly our problem becomes that of determining 
A, B, C, D and E so as to give the first five frequencies or the values of Nn,’, Nnz;, 
Nn;', Nn,, Nn,’ as before. After a good deal of work they are found to be 
A =—164964,84755n,’ + 3°35035,15245n,’ — 3°72071,62874n,' 

+ 2°05278,64045n,' -- °44721,35955n,, 
B=+ ‘91828,76419n,’ — 5:50337,90247n,' + 7°10669,19065n,’ 

— 4°15163,83427n, + 93169,49906n,, 
C =— ‘81317,72759n,' + 2°64515,60574n,' — 4°30806,06243n,’ 

+ 2°76448,01733n, — °65218,64934n,, 
D=+ 05299,17797n,' — °53034,15536n,’ + 1:00172,31390n,' 

— ‘73032,76686n, + °18633,89981n;,, 
E =— -00345,36703n,'+ °03821,29964n,’— °07963,81338n,' 

+ 06469,94335n,’— -01863,38998n,. 
The large number of decimals is requisite owing to the high coefficients they have 
to be multiplied by in ascertaining the values of the abruptness coefficients. 


(xxviii) 
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Now our scheme of action is of the following kind : we shall obtain the abruptness 
coefficients at 2 = 1, or at the finite. ordinate of the first trapezette, for here they 
will be finite. We shall then trust to our auxiliary curve to give the moments of 
this trapezette about x = 1, using the integral 

mys” =| (e—Ipydg=—[ (e- 1 Ge de 

0 0 
And lastly we shall determine the moments and the corrections of the remainder of 
the curve by the process already discussed as if it had to be applied from #=1 
onwards*. The moments for the trapezette before « = 1, and for the remainder of 
the curve, must then be added together to get the total moments and so the moment- 
coefficients about s=1. The transference to the centroid then proceeds in the usual 
manner. 


Moments of first trapezette n, about non-infinite ordinate : 


(mm”= 2N(hA4+4B4404+4D 4,8), 
(xxix) - Mp =— BN (A+ A Bt yO + gD + 43%), 
yp” = 16 (qe A + aby B+ gh 0+ zhyD + rtp), 


py” = — 128.N (stg 4 + 53 B + sos + etesD + rates). 


ale ¥ (55)... 


Again remembering that 


we find 
a,=4(A+3B+50+7D + 9B), 
a,=}(—-A+3B+ 150 + 35D + 632), 
(xxx) {@,=$(A—B+5C+ 35D + 1052), 
a,= #;(-—5A + 38B-—50 + 35D + 3152), 
a, = #; (354 — 15B + 150 — 35D + 3152). 


If we now substitute (xxviii) in (xxix) and (xxx) we shall obtain the moments 
of the first trapezette and the abruptness coefficients at 2 = 1 in terms of the first 
five sub-frequencies. We have 


1 fy’ = —'812,7818n, +°677,0691n,—°660,5497 n, +347 ,1889n,—°073,7827n,, 
Mps"= °706,7407n, —'824,1137n, +'830,5586n,—"441,5218n, + 094,357 2n,, 
Nps =—°634,7502n, +857 ,2689n,—‘880,7900n, + °471,4747n,—"101,1083n,, 
np,’ = °581,4517n, —°854,1149n, +°888,8688n; —*478,0407n,+ °102,7607n,, 


(xxx!) 


* The abruptness coefficients in the previous case were determined from the five frequencies following 
the initial ordinate; here they are found from the four frequencies following and the one preceding it. 











(xxx1i) 


and again 


4 
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a =— °067,9063n,’— 1°651,2396n. + 1:177,1875n,’ 
— +554,8634n,’+ °111,8034n,’, 

a, = °832,2458n,'+  °915,5792n,’ — 2°384,2525n,’ 
+ 1°368,5243n,)— °298,1424n,’, 

ds =— °988,7796n,'+ 2°823,7204n,’— 2°126,0271n,’ 
+ °472,0491n,’— -027,9508n,’, 


a, = 2497,6496n,’— 9°939,8504m,’ + 13:°394,6734n,’ 
— 7822,9490n,'+ 1°677,0510n, , 

ads =— 7°413,4958n,’ + 25°320,8792n,’ — 33°899,3138n,' 
+ 20°768,5399n, — 4°856,4601n,’. 


Here as before n, =n,/N. 


means of the series (xxxii). 
numerical examples. 


We have accordingly to add the values given by (xxxi) to the expressions for 
the moments for the remainder of the frequency corrected for the abruptness by 
We propose to illustrate our results on one or two 


(12) Jllustration V. The following. data provide the years of survival for 
10,000 persons, male and female, born in England and Wales with congenital 














malformations*. 

Age at death Male Female 
Years 0—1 8762 8753 
1—2 393 339 
2—3 140 150 
3—4 95 80 
4—5 86 69 
5—10 185 184 
10—15 90 132 
15—20 86 86 
20—-25 63 52 
25—30 45 40 
30—35 9 40 
35—40 18 17 
40—45 9 6 
45—50 9 23 
50—55 5 ll 
55—60 _ 6 
60—65 5 —_ 
65—70 re 6 
70-75 _— 6 
Totals 10,000 10,000 








Now consider how we should endeavour to find the mean and standard dé viation 
of such series under the-old method. We clearly cannot use Sheppard’s corrections. 
If we concentrate the deaths in the first year of life at 0°5, we shall certainly get 
too high a mean. 








* Registrar-General’s Annual Report, p. 207, 1913. 


Now Pearl has shown by taking Prussian statistics (Biometrika, 
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Vol. iv, p. 515) that as deduced from data registered at short intervals of days, the 
mean of the total population of infants dying in the first year of life should be con- 
centrated at 0°3 instead of 0°5 year of life. But our infants, with congenital mal- 
formations undoubtedly die earlier than the great bulk of normal infants. We 
might therefore hazard a concentration at 0°2; but this would be mere guesswork *, 
and what is more would not provide the proper correctidns for concentrating in the 


case of other years of life. We obtain, however, by this process the following 
results : 














a A 
Male | Female 
First Year concentrated at : | 0:2 0:3 | 0-2 | 0°3 
me E 
Mean | 12436 13313 | 1°5077 | 1°5952 
Standard Deviation | 45932 4:5734 | 5°7750 | 5°7532 
| 





The differences between the 0:2 and the 0°3 results are considerable and it will 
be found from the sequel that the 0:2 results are closest to the corrected results 
for both mean and standard deviation in the case of the male and the female. 
Indeed a quite reasonable result might have been reached by centring the deaths 
in the first year of life at 0°2. But such a priori guesses must be at best risky. 
When we proceed to apply our method by cutting off the first year of life, we note 
at once that in this case, as in many other of a like J-distribution character, 
a grave difficulty arises, namely we have starting from the group 1—2 not got the 
groupings in year or five year ranges, for we have cut off the first of our five year 
groups. We cannot therefore straight away apply our formulae based on the 
Euler-Maclaurin theory for equal subranges. The suggestion that at once occurs 
is to take year groupings for our material. This of course would make no change 
in the first raw moment »,', which would be the same whether we grouped into 
year or five year subranges on the supposition that we simply split up our 
frequencies into five equal groups for the five year periods. But there will be 
a change for the second and higher moments. For the second moment the total 
frequency of the five year group (nz) centred at # has to be multiplied by a? + 2h?, 
where h=} of the subrange = one year, and similar corrections can be easily 
obtained for the higher moments. Of course this distribution of each five year 
frequency into five equal one year frequency groups is not satisfactory, but with 
the irregular data as given it is, perhaps, as good a result as we can hope to get, 
until official statisticians recognise the difficulty and table their statistics in a manner 
to meet it, ie. in this case, it-would mean either proceeding by four year groups 


after the 4—5, or giving the 5—6 frequency and then proceeding by five year 
groups 6—11, 11—16, etc. 


* Actually our auxiliary curve gives 0°210 for males and 0°205 for females for means of deaths in the 
firat year of life. 
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Assuming the legitimacy for the present purposes of this redistribution in year 
groups we find for moments round the end of the first year of life: 


Males Females 
1238p," = 9446 1247p,” = 12079°5 
1238p,” = 207,007°25 1247p,” = 831545°75 


Again we have for males: 


n, = 8762, and by (xxxii) a,’ x 1238 =— 1122:222,8440, 


Nz = 393, dy X1238= 3041°534,5373, 
ns, = 140, a; x 1238 =— 9518'206,0741, 
m = 95, aj x 12388= 19254°345,0950, 
n; = 86, a; x 1238 = — 58196°492,8841. 


Our abruptness functions are thus found to be 
1238 5 (ay — dys’ + az4gg Gs) = — 83°056,6602, 
1238 545 (a. — 734.4.) = 18-978,9435. 
These provide for the moments about 1: 
1238p,” = 9446 — 83:056,6602 = 9362°943,3398, 
1238p,” = 207,007°25 — 103'166,6667 — 18-978,9435 = 206885-104,3898. 


We now find from (xxxi) the values of 
8762p,” = — 6921°345,3000, 


and 8762p.” = 5951°033,6815. 
Thus: 10,000p,’ = 8762p," + 1238y,'"=  2441°598,0398, 
10,000 pi’ = 8762p,” + 1238 p,”” = 212836-138,0713, 

or py’ = 244,160, js’ = 21:283,614. 


Thus finally the Mean = 1:2442 years and the Standard Deviation = 46069 years. 


We now turn to the female deaths and find with the same notation: 
1247p,/” = 12079°5, 1247p,” = 331,545°75. 


Here n, = 8753, leading by (v) to 1247a,’=— 1014°250,5807, 


Ny = 339, 1247a,’=  2949°801,0796, 
nm, = 150, 12474, =— 7980-615,3654, 
n, = 80, 1247a,/= 19991:399,2729, 


n;, = 69, 1247a, = — 60065-060,3135. 
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Hence we deduce: 
1247 x py (a) — pyas’ + apey Os) = — 75°422,972, 


1247 x zy (as — $y) = 17-970,765, 
and 1247,” = 12004-077,028, 
1247 y,/” = 331545°75 — 103:916,667 — 17-970,765, 
= 331423'862,568. 


Again by (xxxi): 
87534,” = — 6961°151,020, 8753,” = 6002°499,428. 


Thus: po’ = (8753 p," + 1247 ,"”)/10,000 = 504,293, 
pia! = (8753 prs” + 1247 ’”)/10,000 = 33°742,636. 


Accordingly we have for females : 
Mean = 1°5043 years, 
Standard Deviation = 5-7869 years. 


These are both in fairly good accord with the result that would have been 
obtained by the a priori guess of 0°2 for centring the first sub-frequency. 


(13) Illustration VI. It is not without interest to inquire if this centring 
of 0 2 maintains itself when we turn to other material for congenital malformations. 
We can use the material provided in the United States Census for 1899—1900, 
Vol. iv, p.670. From the data there given we deduce that for 10,000 congenitally 
malformed individuals of either sex born: 


Died in year of life Males Females 
o—1 9626 9543 
1—2 129 204 
2—3 61 57 
3—4 27 49 
4—5 14 25 
5—10 54 41 

10— 15 34 41 
15—20 20 8 
20—25 14 8 
25—30 — 8 
30. -35 7 _— 
35—40 — — 
40—45 — — 
45—50 — — 
50—55 _ 8 
55—60 14 pas 
60—65 = = 
65—70 — 8 

Totals 10,000 10,000 


We have as before: ¥’s, 374y,/” = 2657, 9’s, 4457p," = 2595-5, 
3874y,." = 71127'5, 457y."’ = 76280°25. 











ELEANOR PAIRMAN AND Kari PEARSON 257 


We now turn to the abruptness coefficients at the end of the first year of life 
and find : 


Males 


Females 
m= 9626, and by (xxxii) m= 9543, and by (xxxii) 
374a;'=— 808°283,5789 | 457a;'=— 942°176,2334 
ng=129 374ae=  3203°644,5476 | n.=204 457as'=  3281°101,5644 
n3=61 374a3'= — 9271°066,1366 | N3=57 457a;'= — 8958°636,6700 
N4=27 374a,)= 23389°468,5164 | ng=49 457a= 22229-438,8090 
Ng=14 374a, = — 69671°015,1599 | n»=25 457a5’= — 66617°544,9966 


374 x ys (a1' — Faas + aPsp 4s) = — 56°784,418, 
374 x zhy (ay — 74501) = 18'962,592. 
Thus:  § 374y,""= 2600°215,572, 
37 4p” =71077°370,741. 
From (xxxi) we have: 
9626p,” = -- 7768-448 ,082, 


| 457 x vy (ay 7. Bots + BBE0 as) <<SrR 68°275,096, 
| 457 X hy (a2! — p$ga4’)= —19-991,508. 
| Thus: 457," = 2527-224,904, 

| 457 p"” =76222°175,159. 

| From (xxxi) we have: 

| 9543p,” = — 7640°738,2653, 


Mean = °4832 year, 


Mean = 4886 year, 
Standard Deviation =2°7412 years. 


Standard Deviation =2°8322 years. 


or, 1 — py,” =*1930, or, 1— p,” ="1993, 
9626” =6736'839,298. 9543p." = 6604°373,5073. 

Thus: 10,000,:;’ = — 5168-232,510, Thus: 10,000,’ = — 5113°513,361, 
and pi’ = — 516,823; | and py’ = — ‘51,3513; 

10,0005’ =77814-210,039, | 10,000 ps’ = 82826°548, 666, 

po’ = 7°781,421 ; | ps’ = 8'282, 655 ; 
or, finally | or, finally 
| 


It is clear that in both cases the centring of those who die in the first year of 
life is a little under 0:2, instead of slightly over 0°2 as in the English data. It is 
worth while inquiring what the effect of concentrating the deaths in the first year 
of life at 0°2 and then simply determining the crude moments will be. We find: 























| 
: 
| Concentration at 0°2 persarr perch y Complete corrections | 
J es | 
Male _— | Male Female | Male | Female 
ere er rer es 
Mean | 496 | ‘496 | 489 | 495 | :483 | “489 
Standard Deviation | 2°729 | 2°821 | 2-730 | 2°822 | 2°741 | 2°832 | 





This process then gives quite a reasonable value for the mean and standard 
deviation. Thus all we have to do for a rough practical value is to use the pu,” of 
the first equation of (xxxi) to obtain the centring of the first group and then find 
the raw mofnents only. For a very high first group this is considerably better 
than applying our first non-asymptotic method and of course better than mere raw 
moments. The following are values found from year groups : 


* That is at -1930.and 1993. 
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1st Method of this paper Raw moments 
Male | Female Male Female 
Sake | Sra + Spee 
Mean 609 610 *592 592 
Standard Deviation 2°73 | 2°82 2°72 2°81 








The means are inadequate, but it is remarkable how close the standard deviations 
are to the corrected values. 

(14) The reader may occasionally be puzzled to settle whether a frequency 
distribution has really a finite or infinite initial ordinate and therefore be ip doubt 
as to whether he should apply the first or second method of this paper. Our 
Illustration III may be taken as a possible example of this, although the ex- 
aggeration of the first frequency is nothing like so marked as in the case of con- 
genital malformations. 

If we apply the first equation of (xxxi) to the first three days’ period we find : 


whence 18°25,” = — 14°057,688 


0—3 days n, = 18°25 
3—6 , m= 658 


6—9 ,, m= 789+ wy” = — "TT, 
9—12 , n= 565 | or remembering our three days’ unit, 
12—15 ,, n= 5°82 Mean = ‘69 day. 


Our table now becomes: 


0—3 days 18°25 centred at ‘69 days 
SS 6:58 45, 
6—9 ,, 789 ee 
18 . 5°65 105, 
12—15 ,, 5°82 135, 
15—1 months 19°80 ‘75 months 
= 22°59 15 3 
3s . 18°58 2°5 c 
3-4 , 15°96 3°5 a 
3 13°30 45 
eS ee 1151 5d - 
6—7 ,, 10°61 65 ms 
7—8 ,, 9°30 7°5 os 
= ae 8°74 8°5 2 
9—10 ,, 8:29 9°5 fe 
10—11 ,, 7°51 10°5 a 
eS ee 6-94 115 s 
Total 197-32 


Hence by raw moments we find : 

Mean = 112-98 days as against 114°61 days, 
Standard Deviation = 105°53 __s,, ‘ HO715. ss 
found by the first method of this paper. 

Here we have not used our full second method but the results are in fairly close 
accord, especially in view of the fact that we have not corrected for the curtailment 
abruptly at the end of the 12 months, Accordingly the suggestion made is that 
in doubtful cases both methods will give fairly closely the same values, and therefore 
we need not worry over which is the more correct one to apply. 

















PECCAVIMUS! 


This paper is devoted to a number of slips recently made by the Biometric 
School and which it is desirable to correct at once, before the formulae which need 
correction pass into general use. Some of these slips are due to war haste, others 
to neglect of terms which ought to have been included in our approximations and 
some to printers’ errors. We have to thank Professor Tchouproff of Petrograd for 
indicating the existence of several of these mistakes. 


(I) Biometrika, Vol. x1, p. 215. On the Probable Error of a Coefficient of 
Contingency without Approximation. By Andrew W. Young and Karl Pearson. 

Down to p. 222, equation (xii), this has been again checked without discovery 
of any error. But on that page the authors “take M to be very large compared 
with N and make yx, = y2= Xs = X.=1” by an oversight. The values of the x’s are 
given on p. 217, ae (vi), and clearly when M is very large compared with N, 
1 =X. =X, = 1 and y,=1—2/N. Accordingly equations (xiii) and (xiv) of p. 222 
for samples from “an infinite population” require modification and should be*: 


oo=$ [HG -(0CR 
+ ys [08% | - -48 (Fe 8 (F) +108 (7x) - 128 (753) 
+m lsGe)- SQ} - 6965) + 49GR) 8G) 

















+ [es(R) v0 wef) mG) 
—(2— 4g) c— 16g" + 10¢¢- 2] 
+m [SE +5) 18 GS-98) + #8) 


-s (%) (Re — 46 + 8) — Gp + 124% + dept — ct — 20 +2] 








* The changes due to x; affect the term in 1/N%, but the original (xiii) has a wrong sign to the third 
term in 1/N?. 
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We may now turn to the numerical illustrations. It will be sufficient to show 
the correct values of oy: in the table on p. 224. 

















First and second terms 
sae (xiv) | All terms of (xiv) 
Old Values 02709 02729 
Corrected Values 02725 02744 





For practical purposes, these would all be taken as ‘027, and accordingly the 
errors, although sufficiently distressing, do not modify the conclusions, that for 
a sample over 1000 the first and second terms of (xiv) are adequate. In the second 
example, p. 227, more serious changes are made, chiefly owing to the error in the 
sign of the second-order term (2 — 4¢*) c, which becomes of greater importance now 
that N is reduced from 1801 in the first illustration to the 218 of the second 
illustration. We have for oy: 








First and dt : 
ist and second terms | au terme of (xiv) 
Old Values 0798 0823 
Corrected Values 0693 0719 














Thus for practical purposes the 069 of the first and second-order terms is only 
raised to ‘072, if we include the third-order term. We may therefore conclude that 
250 cases marks something like the limit at which we need to consider the third- 
order term as well as the first- and the second-order terms. 


We now turn to the test for zero-contingency. Equation (xvii) of the original 
paper is correct, but the wrong value of y, was inserted to obtain (xviii); it should 
of course be 1—2/N. This leads to 
1 Fe c(c — 2)-—2(c?—-1) 


o' 


eee +0e- yh pee (xviii), 


H N 


or perhaps as it is better expressed : 


1 1 1 . —— 
oy = wy \s (5) +2 (a ~ x) (c-—1)- WI Shedeeel (xviii) bis. 


The formulae summarised on p. 229 must be altered to accord with the results 
given above. (C) must be (xiv) of the present paper. (D) must have —2¢ and 
not + 2c for its last term. (C’) must be (xviii) above. 


(II) The object of our next note is to make some additions and corrections provided 
by Dr Isserlis himself to his paper: “On the Conditions under which the ‘ Probable 
Errors’ of Frequency Distributions have a real significance ” (R. S. Proc. Vol. 92, A, 
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pp. 23—41, 1915). In that paper he gave the values of the frequency constants 
By and B, (formulae (19) and (23), pp. 30 and 31) of the distribution of the 
moment-coefficient of any order u about a fixed origin for a sample of size n drawn 
from a population of size N. These formulae are exact and no alterations are pro- 
posed here in them nor in any conclusions drawn from them. In the latter part of 
the paper Dr Isserlis deals with the value of the 8-constants for moment-coefficients 
referred to the mean of the sample. These latter values were approximate and 


intended to be correct to terms in = . We are indebted to Professor Tchouproff 


for pointing out that there is an error in the approximation, for one of the neglected 
terms rises. When the correction is made, however, the statement (p. 24) remains 
true that “for coefficients of high order the sample has to be an inconveniently 
large fraction of the population itself if 8, and 8, are to approach even approximately 
their Gaussian values” (i.e. 0 and 3). The results in the paper cited are exact and 
correct* until section 5 (p. 35) is reached. In that section, formulae (38), (39) and 
(41) are approximations and for the purposes of the paper should be given correct 


to terms in = for (38) and to terms in - for (41). The use of the incomplete value 
de = S (da, X,*) — Wjo:a dz 


in equation (37) has intreduced an error in the value of M, given by equation (39). 
We proceed to amend this error. 


Websve = : S {n, (a, — %)"} = : S(n,X,"), dX,=-dz; 


2. jig = = 8 {dn,X," — un, Xd} 


+ > S {- udn,dx X ,“— + — 2 noX,"-*az4} $<: 


= A + B +\terms of third and higher orders in dn,, dz. 


Now it. is well known that the mean value of fifth and higher powers of dn,, 
dz, ... contains no terms of lower degiee than the third in 1/n. 


In the formulae (38), (39) and (41) the values cf M,, M;, M, were obtained as 
the mean values of A*, A* and A‘ respectively. The inclusion of the neglected 


eet ee 1 - 
terms does not affect M, which is given correct to ,, nor M,, for the only term of the 


fourth order in dn,, d@ in (A + B+...)' is At 


But (dyy)* = A* + 3A°B + fifth-order terms in dn,, dz. 


* There are some obvious printers’ errors overlooked im proof, of which the omission of the factor 
(u’u)4 — 2y’2y, (u’y)? + (wou)? in the first line of equation (21) is most likely to mislead. It may also be 
noted that the factor 2? is missing in the first term of (26) and the factor 3 in the first term of (41). 
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Hence the correction to be applied to the value of M, in formula (39) is the 
mean value of 3428. 


Now A =*8(dn,X,") — Up sde 


and B = = S (- uX "dn, dz) 4 Uu = 1) pu 2®?, 
so that 
ype ~ S(dn2X 2 + 2dn,dn,X,"X,") - wate S (dn, d@X,") + utpy de. 


Let us write At=L+M+N and B=H+K. 
HL--"% [* (X"—'dnfdz) + S(X 2" X " dnZdn,dz) 
: n* +28 (X27 X,"dnZdn,dz) + 28S (dnedn,dn,dZX "7X" X, »} 


Denoting the mean value of HL by HL we have (see (31), (32), (33) of paper cited) 
HL = — | 8{xeeX|agn, (1—%) + $'] mex i 
+8 (x wa Xe 4 2X Xr) ay Mae aC (1-")+£) x,- (29+ ) x,]} 


~ 28 {yp ie? (X, + Xi +X) lig 





=f [ae (E29) xeon C= 2) arse sanz 
— 28 i(“ 3) (XH Xr + 2X mx} 


i 2s {ene Mey (x, 4 X,4 Xp) XXX, | 








+8 {Per (XX "+ oY aXe — Xe Xo axeexe)}] 








ied uxd |- 39 ze} 2° se Nt (X, au X ox nox 





- 8 PM axe 44x eX} 
— 29 {Nee (BX eXHXy"+ XeeXe 7X," 


+35 (™ xe) +8 [ren (XX, + axe—xe)}] 
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- E +8 {" (XX " + WX MX — OX Xe X aexe}] 


ee ux 2 Np? — 27 py Mu uss | 
| + Npouplu + 20 fous Muys 





ae 





[Hsu + 2 pou Musa — Peubu — Moutifu—a] 3 


3 
or mean value of HL* = “x {¢ "s + 2p brabus ) 
n = Pew bu — 2prou—s buys 


= £ (Mau + 2pleu—a uta — Hou pu — porite)} > 





HM = “eens [S {X."*dn,2da"} + S{X,"X;"dn,dn,dz"}], 


so that, using (34), (35) of the original paper 
HM = — [s {x ey | ( (1 = *) + £) +X} (2p ‘ ®t 
+8 {xrXity ae | eex, Hed +12, ai 
ied od Cm 
428 { xn xe} ape i x." xe} ] 


$e £ | m8 5 re +8 { x 











g {mene 


s fh cen xoxo} 





= x|4 (a feu—1 — PePuPu-1 4 2 iu Muti) 
b’ ( 
er Me Meu-1 = Peuti — Pu-Pu+e mF BuPu+ . 


. Wu es = 
Again HN =- —n S(X,"—dn,dz*), 
so that using (36) 
HN =— “He | gx (396 aX. x6 S(XP+ 80X.—m))} | 


us 3 = . 
=— = 2. ¥ | 3 at + £ (Muze 5 3a pu es pot) | ’ 


KL = “(tHe 9 (X dnd} + 28 [XX idndn dz} ]. 





* It must be remembered that ¢ and — £ are of same order in =. Cf. equation (5), p. 26, Le. 
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Therefore, using (34) and (35), 
ate Dowal g Lym 1, (gia ¥), xelogt a® 
pastebin rot [u(6i-Bo$) ox-Q0% 4} 


Ng 
n? 


+28 {Xt Xety E (2X,X,—p»)—(X,- Xp all 


2n? n rn? 


a= it Dates | xd {(" ~%) xe—3"% x9 xl 
+ x8 2 ae X24 4 xX x} 
+ $ pS Ns y 2 +8 Ne ¥ au+a 
Xin [Pinte e; 
NgN . 
= 28S {a (XX * pee 2X etx tt + xox] | 


n? 


u(u—1) py 
ane a) x E {Ms (fae — fr?) + 27x41} 


, 


7 {oa blow + Mure + 2 us — 2a) , 


Ku =" Dies (. Seer) S {dn,daX,"}. 


Therefore, by (36), 


n 


para ae wuztla— q 
RM = —S0N— Necstns Lx (Sy Xe + xb 2 (X2-+ SuaXe— mh 


= u? (u — 1) by / 
M=- ( é u—1 bu—2 x {Supe 3 £ (Mu+s + Slebut _ Hm} g 
rs TEs fh Mums az . 


and using (26) of the original paper therefore 


, 


3 Pi 2 
Mean KN =~“ we = | Su +X e (Ms + 34)| 





Adding these various terms to the mean value of M, as given by (39) of paper cited 
we find for the corrected value: 


M, = x [2pu® — SHuplon + fou — BU pur (Mey: — 2utifu) 


+ Bu? wus (Muse — Mobu) — U (wus) ] 





> 
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3 
== 2 {¢ | ut + 2 prea Pupruta — Hau ple — 2pruts Hour) U 


+ Qe? pus (Me foua — Mop fu + 2uu pugs) 


‘ u(u—1 
— W py: (Bpebu) + = ) ; Pu—2 (Maplon — Pobn® + 27041) 


3(u—1 
—w(u— 1) Bu-2hu-1 (3s Huts) + Ceaae ua bu—a (3u2)| 





+ e |- U (Hau + Zinta ow — Hubtou — Hua fous) 


+ Qu? pu (Me few + Hows — Mu-1Puse + Pubuss) 
— UF wy (Mure + Sis bu — Hs fu-1) 


u(u—1 
* ae Buz (Mabou + Mate + 2p ug, — 2p pute) 


— (U1) bua bu—2 (Muys + SHepfusr — Hsu) 


3 ae | 4 
+ a Mur Mu—2 (uy + 3u)|} ? 


and (40), (46) and (47) must be modified accordingly. It remains true that for 
a normal population M, vanishes when w is odd, and that in all cases B, oc 3 


If we write this value of M, in the form 


, is 
M,=%XF 5 ee eet 


nr? n> 





> 


then in order to obtain B, correct to : , the third term may be omitted, R has the 


same value as in Equation (47) of the paper cited and X is zero for normal distri- 
butions when u is odd. For even values of « in normal distributions the value of 
K is 


—— aos (Hofeou — Ho bu*), 


u(u-—1 
1 (je? — pope) + ) 


which easily reduces to —4u x P x jy, where P =p. — px? as in Equation (52). 
We may therefore add to Table I on p. 39 of the original paper the following 
column : 








u K 

2 = Dye? 

I 0 

24 576 ps9 

5 O 

6 | —~ 457,650p0!” 





Biometrika x11 
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In Table II on p. 39, the third column is unaltered, the second column becomes 
(the coefficients of ¢ being approximated) 





r 
u B, 
2 8 (y -0°759)" 
n x 
3 0 
- 102 (x’-0°108¢)? 
n x 

5 0 
3 1099 (x’- 0°0409)? 

" x 














The corrected form of Table III (p. 40 of paper cited) is now as follows: 


Table‘III. Approwimate values of 8,, 8, for samples of 1000 out of 
a population of 1,000,000. 











| 

u Bi Be 
2 0001 3012 
3 0000 3°090 

ae 0-081 3204 

| 





Thus the effect of the correction is to change the values of 8, for u=2 and 
u=4 from the values 0°008 and 0°102 to 0001 and 0:081 respectively, but it 
remains true that the frequency of the fourth moment-coefficient differs appreciably 
from the normal distribution. 


(III) Dr Isserlis also wishes to make the following emendations in his paper 
in the last number of Biometrika, Vol. xu, p. 134. On p. 138 near the foot + ABC 
has been dropped from the bracket (8/GH + 2AF* + 2BG? + 2CH?). Also in 1. 6 
of the same page for “on Q” read “and Q.” 


(IV) The point indicated by Professor Tchouproff, namely: that fourth-order 
mean products are of the same order finally in yas third-order mean products and 


cannot be neglected therefore in comparison with third-order mean products, is of 
great importance in investigations into the probable errors of frequency constants 
in the case of small samples. In expanding functions of the deviations from mean 
values ef subfrequencies such as 6n, we cannot neglect products of the fourth 
order in the 8n,’s compared with products of the third order. In obtaining results 
true to products of an odd order in the “statistical differentials” we must proceed 
to products of the next highest even order to reach correctness. 
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This principle, which is almost self-obvious, was, however, overlooked by Pearson 
in his paper “On the Application of ‘Goodness of Fit’ Tables to test Regression 
Curves and Theoretical Curves used to describe observational or experimental 
Data,” in Biometrika, Vol. x1, pp. 239—261. 


One of the objects of that paper was to investigate the probable errors and 
frequency distributions of errors in the mean and standard-deviation of an array. 
If we have an array of a first variate corresponding te a small subrange of a second 
variate in a sample of N, the law of distribution of the means and standard- 
deviations of such arrays when many samples of N are taken had not been 
investigated at the time Pearson wrote. If there be n, individuals in such 
a sample, then the problem differs from the ordinary problem of the distribution 
of means and standard-deviations in a sample cf size ny, in the fact that ny in the 
case of the array varies from sample to sample. Hence we cannot straight away 
assume that if %, be the mean number in the array then @i,/Viip and Gip/V 27, 
will be the standard-deviations of the distributions of means and of standard-devia- 
tions of the arrays; still less do we know how far it is legitimate to suppose these 
distributions approximate to the Gaussian or normal type. As the problem is an 
exceedingly important one the writer asked Miss Eleanor Pairman to revise his 
work of 1916 by introducing where needful the fourth-order products. This she 
has done with certain additions and expansions. 


(a) From the equation on p. 239 we have: 
mean (Sm,) = mean & (—)* 8 (roe (Fy i “eget (Fy 
where = is a summation for every value of a from 1 to . 
But mean dngp dnp = Tigp ( _ i) 


and the regression relation is accordingly 


n 
— “@ 
SNe = = dnp. 
Pp 


Substituting this we see that every term vanishes and accordingly 5m, = 0, 
not merely to a high order of approximation, but absolutely. In other words the 
mean of the means of any array—notwithstanding that the number in that array 
will vary—is equal to the mean of that array in the sampled population. 


(b) We have for the pth array: 


a _ S (ign + SNqp) Xq 
5s ides, a fipt+ on, ’ 


Sn, = 9 Sraptinta) — S (gpg) Sry . 
Np = Np (Np + dnp) 5) 


dnp = S (Sng) and S(Fggxq) = TpMyp, 
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and accordingly 


Tip + Oy a ‘Fly + Sy ’ 
where &, is measured from the sampled population array mean. 


5m, = S {Sng (%q — My)} _ S {Sngpiq} 





Now we desire to obtain the various moment-coefficients of 5m,, or mean (Sm,)*, 
which for convenience may be written {8m,'}. 


There are two ways at first sight of doing this: 

(i) We may expand (7, + 5n,)* in terms of 8n,/7, and then take the mean 
values of products such as: 

(Snp)* (SngpP (Snap (Snrgp” «...-. : 

This was the process adopted in the original memoir. It is very laborious and 
the algebra so lengthy as to lead easily to slips. Still on the present occasion we 
went to terms of a high order (O,) and some of the results obtained will be so use- 
ful in oter investigations on probable errors of frequency constants that it seems 
worth while placing them on record here. The fourth-order mean products in Sn, 
and dnp may be added to those given on p. 245 of the original memoir. 

They are: 


= E (SnySn* gp) = Figp (1 - #) {1 +3 (1 _ 7) Nap (1 = *)t ; 


: E (SnySn%yy Sngy) = (1 ~ ®) RopNgp (1 - *) (1 “ oe) , 


N 
1 2\ Nop Tgp tign n 
= (SnpSnngySngydny'p) = — 3 (1 m 7) "wtaete's (1 4 *) ; 


= E (Bg! BnYyp) = Hip (1 - 2) {i +(1- x) | 2a + iip{1 |. 
5 2 (B14? Bingp’tgy) = (1 — Fp) Papa» (1 — 52) (2 32, 


© 3 (814? 3gp) = Tip (1 -%) {i +3(1- 7) fp (1 - R) 


For the fifth-order mean products, Miss Pairman also provided the following 
values* : 


+ % (Brgy) = Fy (1 -*e) ( - 7) {1+2 (5-5) Ree (1 -*e)}, 


% (Sng Sngy) = “aor (1 - ae) {i +2 (5 - w) Tigp (a se *?)| 
> 5 (8n%gp By) = Fipg ip {1 Se ) Fx 


- (5-5) 1 ~ " _ "at (g - See) 
N/ N } ae i N 

* These results are of course perfectly general, that is to say we can suppress p and suppose them 
the mean variation values of elements ng, ng’, Ny, Ng and Nqy of any frequency distribution. 
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1 Tiaphiypfigry {, 2 6\i 4ii 
ttf -3-(6-2)0- 

1 2 Taphgptg'p fo (/,_ 1 s 5) (Rep t+ iigp  Aigniigp 
eat) =F ? 0 - x)-(5-») Ce - Pee) 
1 6 \ Ngp Nop Ng'pNg'y / 4n 

x % (Sn*gp Srgp SNq"p SNgry) = (5 = m) a m (1 = ~ee , 

: 1, © Gaping yp BypBryny) = — 4 (5- yr) ease teehee here 


Alongside these we give the fifth order combinations of Sn, and dng,, which are 
deduced from these and were required for our purposes : 


Erasmo HH) val) ¥CW 

S & (B1°3n%yy) = Rp (i- ¥) L—iip + (1 +) ip 

+(¢-$)0-¥) fv+50(-)]} 

= 35 (8ng!Bigp 3M») = “an (1 = =) f = i+ (5 f a) ( & =) ( s | , 

: = (Sng? n° gy) = Fig (1 = 7) fis (1 5 fiy — 6 (1 = x) fins 
+(s-$) [0 (¢-¥)-@-9)¥]} 


: = (Sn,75n*pdngp) = — Tigptigp ( ? 7) {2 (1 Me x) 


1S (8n,°8igpdtypBty') = = (5 — yy) eegetee ( a *) (3 a 4%) : 


x 4 NV W 
2 5 (npn) = ip (1 - Say) ( % 7) {1 4 2(5 -¥) figp(1 id ‘g , 
52 Cade) oko (1-H) (0-5) (6-5) 90 - SP)} 
: Z 


ra 


= & (Snpbn*_pdn*yp) = Nyptgrp (1 - 72) {2 (1 — wv) 





1 jigpTigpTigrp [, _ Tip\ [- 4ii 
5 2 (Bp BnXypdnyp3ng’y) = — “9PM? (1 — *) (5 -5) (1 - Se), 


1 (Sn SnqpBny8’p Brgy) = PMCAHY'P He (1 _ Fe) 4 (5 — $) 


2 
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Also we give here additional moments* of the binomial (p + q)” about its mean: 
fn = 0, Ha = pq, Hs = npg (p — 9), 
ps = npg {1 + 3 (n — 2) pg}, Hs = mpq (p — q) {1 + 2(5n — 6) pg}, 
bs = npg {1 + 5 (5n — 6) pq (1 — 4pq) + 15n (n — 2) p*g*}, 
H, = npg (p— q) {1 + 4pq (14m — 15) + pg? (105n? — 462n + 360)}, 
fs = npg {1 + Tpg (17n — 18) + 14p*q? (35n? — 154m + 120) 
+ Tp*q? (15n’ — 340n? + 1044n — 720)} 


The values obtained in the above laborious manner for {m,*} agreed as far as 
we proceeded with those obtained by the following or second method. 


(ii) This second method consisted in first summing for 5n,, on the assumption 
that dn, was constant, and then summing for dnp. This involved some new results 
which will be useful in other problems and are recorded here. 


For constant 7i, + dnp: 
hea i%, 
Mean (8ngp)? = (1 + = Tip (Tip — Tgp) + Tip? dn,’, 
SNy\ TignNg'p . Nop, 
Mean (Sng dngp) = (1 + =) NaopNap eT op "gp 5n, 


Np Np Ny” 


n n n> 
Mean (87)? = (i+ +o) —? (Tip — Tigp) {1- - + 36n, = + <2 dn,', 
P Np 


p 


i a 
Mean (8n7ypdnqp) = -(1 + 3) She Naphap (1 — dn, — oe -* 35n be) + ere 8n,', 


Np Np Pp 
Mean (SngpSnrq'pSnq"p) = see (i, + 8n,) (2 — 38n,) + 8n,'} 


Now the value of this method was at once obvious, for proceeding to the sum- 
mations in the moment-coefficients of 5mp, for constant 7i, + &n, we found that they 
corresponded with values to be found for the distribution in a sample n, of constant 
size. In other words we reached a conclusion, which should have been obvious at 
first sight, namely that to find the value of 

Mean (6m,)* 


all that we have to do is to write down the known value n, =i, + 5n, constant and 
then sum for 6n,. We might have pulled down the scaffolding in this correctional 
paper and simply started from this result, but as several of the means reached in 
processes (i) and (ii) seemed likely to be of value, we have preferred to indicate 
the steps which led us to the fina] method. 


* A simple reduction formula for the moments of a binomial about its mean was sought in vain. 
After a good deal of energy had been spent on the problem, we believe that u, being the sth moment 
about the mean 


as 
n=[Saoseery],_, 


is, perhaps, the easiest expression for reaching these moment-coefficients by successive differentiation. 
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(iii) For a sample of constant size n, the following are the moment-coefficients 
of the variation in the mean*: 


1) — pba _ "tip 
(dp’) n Np > 


8 1) = PES 
(Spy’) Ty? 


(8p,’)' = 24 — Sphs? is Spb" 


3 is 
Np Ny 


1 : : 
Now let S (=) equal the sum of ES for all values of p which may occur in the 
P Pp 
samples of NV. If therefore Sng is the frequency with which n, occurs, the whole 
problem reduces to finding the values of 
S(fnp/n,*), 
for various values of s. 

Now the frequencies of the n, are simply the terms of the binomial. The term 
in which n»=0 must not we think be taken into consideration, for in this case 
there is no variability in yu,’ as there is no frequency in the array, i.e. ,~_ must be 
put zero. Thus in the notation of a binomial (p+ q)" we require to find: 

oye, 80 - DPT .86- Deo FF F 
aS 7" Ben Tovey coseesees (F) 
and to divide the result by (p+q)"—p"=1-—p". This finite series we have not 
succeeded in summing. Before indicating how we may approximate to it by the 
mean-powers of dn,, we can look at the problem from two other standpoints. 


(i) If i,/N=gq be not small the binomial approximates to a Gaussian of 
standard-deviation squared o? = npq =n, (1—ji,/N). Hence 


1 x? 


1 1 +e 1 2 Sie 
Ny® V2ara/ -«@ (Np + x) r 





rd . \ 72 _12 
a _! r i-£ + tG40 = ...)e 2a dx 
NV Qrii,*o os Np 1.2 it, 
1 s(s+1) oe s(s +1)(s +2)(s +3) 3o% ) 
se . 13 @* 1.2.3.4 hee § 
1 1 1 1 1 n) 
Thus s(;-) = iy {1 + “sa 7) + 3 & WV + <eer's 


s(4)= 55 {1+3(5--9) +8 (E-w) tp Seteweeen (G) 


ca te as eae 
8(25) = 55 {+6 (5 — y) + 48 (;, 7) +f. 


* Here pu, is the sth moment-coefficient of the ”, array about its mean in the sampled population. 
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(ii) Another method is to assume a Pearson Type III curve: y= ya*e~*, 
which is known to give a better approximation than the Gaussian to the binomial. 
We assume it to start at the beginning of the first subrange of the binomial and 
to have the same mean and standard-deviation. These conditions involve 








ce a ai _ ty) _a+1 
Y mee o*=7,(1 7 y 
According] s(=) “x [" at—e-vedx, = 1 T(a—s+)), 
by ni) rJ9 eae 


where A = total frequency = a ['(a+1). 








Hence 
s(+)-%-5 —~—_.., 
\Sp7 ""1-(—-3) 
i, N 
s(1)- y 1 1 
Ny 





(57) aes Dea a 1 -(- w)}t-2%- a) 8& - wf 


i, NW 


—— 


which are exact. Or, approximating 
ee he ve: oe 
s(t +G-x)* Gt -F 
1 1 1 1 1 1\2 
8 (x4) = ga {t+3(—-%)+7(G-q) t+} Pi Rueowesus (H) 


1 1 f 1 1 1 .\ 
S(-3) =a 1 +6 (= - 9) +25 (=-7) toh) 
Sei gi 1 ; 
Both methods agree to the terms in ( ‘~ 7) , but the Gaussian appears to exagge- 
. one 
rate in the terms in ic _ w) : 





(iii) We will now proceed to approximate on the basis of the moment- 
coefficients of the binomial. We have 
"blots: Sa 1 ek i PSA (s+ VU) (@ + 2) Srp"), ) 
Os (Tip + Sn,’ Tin? \ -: 1.2.3 i,’ <b 


Here S(5n,)=0, and we will keep terms up to the order which involves 


ate 
Np 


proceeding to the fourth moment-coefficient. We find 
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Jone “saat a eas ee x) se Se ewe y x) 


mp N/\n, N. 
Hee Mer B)[s6-H) EB e8 


_ 8(8 + I) (8 + 2) (843) (8+ 4), 10 (2 -3) 7) 
Np 








1.2.3.4.5 W) \n,~ W. 
s(s +1) (s+ 2)(s + 8)(s + 4)(8 +5), 1\3 
* 1.2.3.4.5.6 15 (5 ») - 2} 


as far as terms of cubic order in the curled brackets. 


Hence we find 


+(e - nv) (11+) +50(5-- 5) +--}, 
s(=)=t5 {8 +(=- x) (6+ y+ a) 
+5(—--%) (7+ +5) + 225 (> - 5) +... 


It will be seen that these values agree to the first term in (; - 7) with those 
P 


given by either hypothesis (i) or (ii). For the terms in (= _ nn) they appear to 
P 


be intermediate between (i) and (ii). The additional terms which do not occur in 





either (i) or (ii) are those in powers of = in the second- and third-order terms. 


N 
Using these results we deduce : 
pM, = mean (dy,'F 


Efe -A)Gedon) 2G a) (+8) -a) 


Thus the probable error of the mean of an array of mean size fi, in a sample 
of N is: 


O7ip (= - x) ( 1,1) 5(=-- wy) 22 ta (= x) f 
era49 TH + fa w)\+y+t m)+ & N (7+) +75 i, WV 
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Again 
pil = 22114 (= — 9) (8+ 7+ wi) + (= a) (11 +77) +50(5 - x) 
Ra eae: (K), 
and if t-5 ap 
>My a its(8+y+ yi) + 8 (11457) + 508+| 
B, = = ? 
ne) yon {1+s(14q+ qa) +e(2 +p) + 6r} 


which after reduction gives 
os By wm) 1 a 1 / 2 1 1\3 
oB, Bh + (8457+ yi A w)+ (1345 7) (a #) +69(=— 5) 


Clearly if ,8, were as large as 0°5, »B, for an array of 30 would be of the 
order ‘02 and thus the array means have an approximately symmetrical distribution. 


We now turn to the value of ,B,, and find with the same value of £ as before 


pM, — 8,M, = 24 Se fy (6+ +35 n) + (35 5 10) wo gone 
ny? N } 





Sofls* 2 3 26 
4 Belt (1+ y+; ‘ mi) $+(64: Hy) + 34! 


Np 
r 2 = ny" a 2 me) -( 1 wv) ‘ 
Further, (,M,)~* at (2 ++ ai) o-(l +p) o- 48%. 
Multiplying the previous result by this we have 
Barge ea = {4 (4+35+ 9) (2-9) +(®+ FH) (G- x) 


: 








Ny | oe At ee NJ\i 
Pe ie | 1 \? 
+145 (= ~5)} 


1 1 2 3 5 1 1 1 i} 
+3(--5) + q+ mt (149) (F- x)+21(--y)} véeees (M). 


For example, in an array of 25 in a sample of 1000, if ,8, were as high as 3:8, we 
should have ,B, slightly less than 3°2. Accordingly the constant ,B, of an array is 
not as approximately normal as ,B,, or, we have the material heowl out further 
towards the.-tails than in the normal distribution. 


It is probably, however, adequate to speak of the means of an array of variable 
size as roughly following a Gaussian curve and give the usual meaning to the 
“probable error” of the mean of suchanarray. Its value however is more accurately 


0 Tt than *6744907»/Viy — 1. 
AY Np -—1 + WV 
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We may adopt a similar process to find the standard-deviation of the second 
moment of an array ina sample. For an array of constant size n, we have 


1 1 1 3 
Mean (8p42)? = - (1 _ Me {(1 _ =) phi — (a - = pbs iz 
where ,m, and ,u, refer to the values in the sampled population. Thus 
cS of PeA Ep eke ae 
Mean (6,2)? = % ¢ =e {(1 = =) (pB2 —-3) + ah, 


and, summing for all values of p, 


oyu, = wut [218 (= 8 (=)} + GB.—3) | 8 (— -}= -28 (=, 4) +8(S)f 


Accordingly we require to find S (~ -)-8 (5) and § a) -8S (=). Writing 
Np P 


fas before = =e 3 we have — . = €+-— and after some reductions 


i, 2° Ny +H 
s(5)-8(g)-% wt (at ar cat 
8) 4 (5-8) 16¢}. 


de 
a) 8G) nap tte 
Accordingly if pu. be the second moment of the n, array in a sample of N, 
n= [ft-4-(+8) @-$)- C9) E HE -B 
+4680 {(.-4)~(04 $f) @-$)- (048) BY 


Np 


~10 G “ xt ee (N). 


and further the assumption is made that (6c,,)* 


Re 
W) 
 .F! 
N 





It is usually given the value a 


P 
may be neglected in 2ony OT ny +( 8on,) = dpe, So that we obtain the value 


Now whatever may be said for this result the method by which it is reached is 
distinctly defective and this not merely because it assumes normality. We have 
in fact for any distribution of size M 


o = Vin. 


Now let us measure o from the mean value @ of the sampled population and yp, 
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from the mean value ji, of zw, in the samples. Then A, will not be ~, but be equal 


to (1 - ia) f. Accordingly 





or expandin 80 = om ( Pe Se an) 

atid =" OM \" * am * aut 6am 
Spe 1 3 5 

Zz (1+ stam rem) 
Spn\2 8 15 
“ate (1+ 59+ ams) 


Gy. 8)-£(+-) 0 


Now we need first the mean value of 8¢ and for this purpose require the mean 


ae 
1 
2 
1 


2 
16 


, 2}. 
powers of sh These will be about the mean value of y, in samples, i.e. (1 - i) Hs 


and are*, if we use curved brackets to represent means : 
Spin\) _ Su.\*) [1 iy. 1 1 3 
(CZ) si (Z)}- Lat (2 az) Be ae (2 -) (2 -7)|- 


{("#y m _ |. — 38-69, +2)— mys — 218, — 188, + 26) 


+ yp (OB.— 398, — 228, + 54) +...], 
Sys\4 1 = ee z =. = a ay 
{(#) > M: [3 (B2— 1? 5 M (B. ot 4B, Neg 158.7 — 248; + 488, + 968, —_ 30) 


- 7 (48, — 408, — 548, — 968, + 3368, + 5288, — 306) + a ...(Q), 


where as usual 
Bor = Herss|fio’*!, and Bors: = Harss X Hs/ jis" **, 
and have reference to the sampled population. 
Substituting in (P) we have 
Mean o = @ + {8c} = G(1 — Ao) say 


i = eo wie ae ee 
=3 E — gy (Be + 8) + i5g aps (8A - 1582 + 148, — 482, 55)| .A(R). 


2 
* The value for \(#) } is well-known, the two later values have been recently given by Professor 
Be 


Tchouproff, Biometrika, Vol. xu, p. 194. 
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In the special case-of a normal distribution this reduces to 


= 3 72 
Mean =3 [1 — gy — a5 ys | TETTTTTITELIT TTT TT (S). 


This agrees with the value given in Biometrika, Vol. x, p. 526, Equation (xv), 
which is now generalised in (R) for a sample from material following any fre- 


quency distribution given by B,, Be and B,. 
We must now adapt the ee (R) for the array n, of a sample of size N. We 


need only to replace z and Wi by 8 (;,) and S ( a3) of our p. 273 and retain up to 
terms in 1/n,. We find 


12,43 1 1 
Mean ony = om» | = 8 : — (a > 7) : 128 i, ! (82, —- 15B2 = 2p. ~ 488, —_ 103)| 








meena (T). 
This becomes in the case of normal frequency 
x 3 31 * 
Onp = FNp [1 a - aa | occ ececccevccocccoces (T* ). 
We can now find o, from (P). Subtracting the mean value {8c} = — GA, from dc 
Stags ee 
we have, if \, =A-— om * 3m’ 
: —. a 1 3 
Bo — {80} =7 fr, +52 (1+59 +a) 
1 /8p2\* 3 
- 3) (+ am) 
1 /dme\? 5 3 
+35 ( ) ue (“ eeu (U) 
Hence squaring and taking means we find 
te ; Su,\) 2) 1 
csen fur (CH I} 4) 
Sp. 2 Spt % 
eat +a)* eG} | 
=o? B.— 1 ap ~ 7B +10, — 248, —23)] . (V); 
- — 39 9348: B 2 SUEDE, «<ccesserawecsing : 
_ aVB,-1 1. 1 48.— 7B? + 108, — 248, - 23 a 
or o.= re ieM ORS RRRERATS Taras’ « (W). 
For a normal distribution this becomes* 
1 a 
ee \-sy)= 7 seilallieg 2 tte x 
a at sm) = Ja ed nearly (X) 


* The value is in agreement with that given in Biometrika, Vol. x, p. 526, Equation (xvi). 


. 
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Formula (X) shows — even for small values of M we have a value only. slightly 


less than the usual 





7 


Turning to the array n, in a sample of N we require the mean values of 
8 hy and 8 (~ , 
Mp Mp 
3 1 5 . ee es 
s(2) “Felt G3) (++ an) +2559) tea, 
1 1\ 665/11)" 
“ag ee ty = (;- write an) + + 795 (5 - ¥) +...} AY). 


Substituting in (X) we find 








Soy, = THN Ba—1 ( _ 8 __1 48,-782+4B,-248,- 17 | ) D) 
os QV, 8N 167, B.—1 oo" ods 
For a normal distribution we have* 
a. 2 ee x) 
Feny = Jae |( gy t ag.) cree (AA). 


Thus the usual value o7ip/V2i, will be about 2°/, in defect in an array of 25 in 
a sample of 1000. 

It will be realised therefore that if we do not take arrays of less than 25 in 
samples of 1000, the usual values of the mean standard-deviation of an array n, and 
the standard-deviation of these standard-deviations will not lead us badly astray. 
We have finally to ask what degree of weight we can give to the “probable error” 
of this standard-deviation, i.e. to "6744960, This is only determining how far 
ny follows the normal law of distribution, that is to say, how nearly are ,B,’ and 
pB,' — 3 zero for the distribution of o,,, these representing the first two 8-coefficients 
for this distribution. Before we do this, however, we may find from (Q), p. 276, 
the values of »B,’, 4B,’ for the u, of samples of constant size M to the second order 
of approximation. They are 

pa 1 (B= 382 - 68, + 2) 1 $2) ae 188, +2 =5)) 
MP; — + at 
Mo (&-1) M\B,-1 — 3B,—98,-188,+6 


reducing with a Gaussian distribution tot 
“i = 1 
ait’ ss m( i. uy) eee ee (CC), 


is not the same thing as if we put 


- 1 Cy f ae 1) 
c>. = C... = ~ ont =a = - = 
Inp 2on,’ 2 ,/2n, 


* The reader should note that Fens 


from (N) on p. 275. 

+ These values are identical to the degree of approximation adopted with those given by “ Student,” 
in Biometrika, Vol. v1, p. 4. Professor Tchouproff (Ibid., Vol. xm, p. 192) does scant justice to 
‘‘ Student.” The only misprint we can see is that 1/n appears in the first term instead of 1/n', the 
power having probably been ‘ drawn’ in printing ; it re-appears in the next equation. Further ‘ Student” 
gives not only Tchouproff’s (19) but his (22); this as ‘‘ Student” himself indicates (p. 4) involved 
T'chouproff’s lengthy equations (20) and (21), which ‘he refrained owing to their length from publishing. 
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and uB/ -3= Mi 0— 4B. — 88? 248; ve 128, + 968, ee (DD), 
(B.—1) 
reducing with a Gaussian distribution to* 
xB -3 =". 


It will be observed that the approach to the normal curve is by no means close 
for fairly small samples. For example, if M=24, we might easily have 9B,’ ='3 
and 4B, =3'5. In othér words the distribution of yu, in samples of size M is far 
from as close to a normal curve as the distribution of means p’. 


We should anticipate accordingly that the distribution of the second moment- 
coefficient in the n, array of a sample NV would be even further removed. 
We find by replacing 1/M* by S (=) that 


( duts\*} —— A 1 B, — 3) 























3 ea N Np B, ak oar" 
(#) — 38, - TL (eae 4 38,-5 .) 
Ry" Np B, = 3B, = 6B, +2 
{ue ya! B.- 1 (g_ 9, 1 Bi 48, 6B? — 248, +308, + 968,— 21 
( Ny? ; xt (Bz a 1B 
Peat ckeade eceuauen (EE). 
Hence 
B,—§ 2f, 8 Si +1 . 
pind Act B+2, 9,1 0B, 8@h-9 
Np (Bs = 1)° ( N "p B.— 1 B, Toe 3B. ce 68, Se 2 
NG Ses: (FF), 
giving for a Gaussian distribution 
yee, ¥ S a 
npB, pas rp \4 lip y) ’ 
and Bat — Bian 3 ee PO ee Bc (GQ), 
iy (8. — 1) N 
giving for a Gaussian distribution 
ee ee ee 
ia ia i a a 


Thus for a small array of, say, 25 we might easily have ,,B,’= 37 and 
np By = 3°6, values very remote from a Gaussian distribution. 


It is clear therefore that the “probable error” of a second moment-coefficient 
has no very illuminating meaning in the case of the arrays of small or even 
moderate size in the case of a sample of size N. It may, however, be remarked 
that the distribution of », is one thing and that of op, which is what we usually 
require is another. In order to obtain this we must raise the expression in (U) 


* See the second footnote on preceding page. 
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for 5a — {Sc} to the third and fourth powers. But if we keep only terms of the 
two lowest orders in our results we require to ascertain the values of 


ay} =i) 


The necessary term in the latter a ° (Pa- 1), but the finding of the former is 


n iy? 
far more troublesome and we have a so far succeeded in determining it. But it 
is possible to obtain some idea of the deviation of the o,, curve from normality by 
considering what its §’s are in the case of the sample being made on a population 
following the normal law. We are able to do this, if we carry a stage further the 
work outlined in a paper in Biometrika, Vol. x, p. 526. We require the values of 
>, ox? and ys, w, on p. 526 carried to a higher approximation by the introduction 
of the additional term in the Stirling’s Theorem expression for the factorial. 
Miss Pairman has carried this out and finds 


= PS. 3 
Sao (1-5 + sat ga) 5 et ete (HH) 
which leads, having regard to (xi) on p. 525, to 
ae. 
spe = 3? = fl, — - Y= an (1 - i a) ee eccccccecevccees (II). 
We must now turn to the equations on pp. 527—8 to determine yp, and py. 
We find* : 5 : 
o 3 o 3 3 9 
w= Fe (1+ gq) ~ dae (1 + an) (1 — a SB TBR) 
o 3 eo nee 
? 7 (1 + ;) as far as our approximation is valid......... (KK). 
, 5 1 3\/1 3 3/1 3\? 
agun mmoles 3 (4 =3) (as * ae) -mletae)} 
3a* 1 
=i (1 ‘ =) ek el ea ein es eee (LL) 


We must now replace n by fip+6n, and sum as we have frequently had 
occasion to do. 


aw 7 (1 4 2 i) 
We have: Ong, oi, (1 + eae ree onerers (MM), 
ory 15 3 
Tnpus — 4ii,? (1+ + i, w) eee ereeeeereeeeseeeseees (NN), 
on 8% (yy © _ 3 
Can Gt (1+ = By inte (00). 


* These values may be used to determine the nature of the distribution of = in samples of constant 
sizen. We have: 


1 9 0 
2B,= 5, (1+7,); 2B2=3+-. 
The term in 1/n? in sBy could not be determined unless we went to a still higher order term in 2. 


Clearly for a sample of 25 =B, approaches close to the Gaussian and =B, still closer. The non-approxi- 
mate values are ‘0219 and 3°0014 (loc. cit., p. 529). 
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1 21-3 
Hence i B, = mn, (a + 4m, = y) ec recccceccceccccccccene (PP), 
: hae. | 
7, B= 3 +8 5 a a) er eee eek (QQ) 


Accordingly we see that when samples N are taken from normal material the 
array n, of varying size in these samples will not differ very greatly from normality. 
For example if 7, = 25 and the sample WV be 1000, we shall have es B, = 024 and 
Phas 3°12, showing no great deviation from normality, although more’ than in 
the case of a sample of constant size. It is probable that the deviation from 
normality will be somewhat greater when the sampled population itself is not 
normal. Still it is important to note that the distribution of c,, for all cases is 
likely to be far closer to the normal, and the “ probable error ” of ong therefore more 
intelligible, than is the case with »,4,. In the same way it is extremely probable 
that the distributions of (npHs)* and (npbts)* are more nearly normal than those of 
ns and ny M4- 


(V) Mathematical Contributions to the Theory of Evolution. x1x. Second Sup- 
plement to a Memoir on Skew Variation. Phil. Trans. Series A—Vol. 216, 
pp. 429—457. 

There are one or two corrections to be made in this paper by Pearson: 


(a) p. 439,1.18. The printer has drawn the solidus and the 38,—28.+ 6, 
which followed it. Thus 
4—m=2 (Be a 3) 


should be read as 
4—m=2 (8, + 3)/(38, — 28, + 6); 
(b) p. 441, 1. 18, about the middle of the page the equation 
ry = 12 (sec 8 — cosec 8) 
is given. It should be 
yy =3 x 12 (sec @ — cosec 0)*/sec 8, 


but no use has been made of the equation in the paper. 
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QUADRATURE COEFFICIENTS. 


In a large amount of recent quadrature work we have found that Sheppard’s 
formula (c) given in Biometrika, 1, p. 276, gives very satisfactory results, and 
Mr P. F. Everitt has tabled the values of the three coefficients. His manuscript 
table has proved so useful that we reproduce it here, as others may also find it a 
help. It will-eventually appear in the Tables for Statisticians and Biometricians. 
The formula supposes the quadrated’area to be divided into p trapezettes on 
bases of equal size h. Then Ag the chordal area is given by 

Ag=h(higt 2, + Zot +. + Zpiat hy), 
where 2, 2), 2... Zp-1, Zp are the equally spaced ordinates. 

The required area of the curve is then 

Area = Ac + C, {(a — %)—(zp — Zpu)th 
- 0, {(Z2 — 4) — (Zp - Zp-a)} h 
+ Cs {(2s — 2) — (Zp-s — %p-s)} he 


Here C,, C,, C; are certain functions of p and are provided for each value of p in 
the accompanying Table. They are selected to give the best result, provided we 
stop at third terminal differences. 
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Ci Cy C3 | Pp Cy Ce C3 

+ ‘2071429 + °3357143 +°4714286 || 54} +°1557006 + °1034971T + °0311473 
*1957755 *2532407 "2108218 || 55 *1556195 *1033186 *0310489 
*1883296 ‘2124868 "1369312 || 56 *1555414 *1031470 *0309545 
*1830517 “1882653 "1037946 || 57 *1554662 *1029820 ‘0308639 
*1791068 °1722411 *0854119 58 *1553936 *1028231 *0307767 
*1760430 “1608670 ‘0738621 59 *1553235 *1026699 *0306929 
"1735931 *1523810 “0659864 60 *1552559 *1025223 -0306122 
*1715884 "1458097 *0602962 61 *1551906 "1023799 "0305345 
‘1699171 *1405724 0560045 || 62 *1551274 *1022424 *0304596 
*1685022 "1363012 0526583 63. *1550663 *1021096 *0303874 
“1672888 *1327519 *0499796 64 *1550071 *1019813 0303177 
*1662365 “1297561 0477890 65 *1549498 ‘1018571 *0302503 
*1653151 "1271939 “0459655 \ 66 *1548944 *1017370 *0301852 
*1645017 *1249776 0444247 || 67 "1548406 *1016207 *0301223 
*1637782 *1230418 ‘0431061 || 68 *1547884 *1015081 *030061 4 
*1631305 *1213364 "0419654 _ 69 *1547378 *1013990 “0300025 
*1625472 *1198227 0409690 || 7 *1546887 *1012931 *0299445 
“1620193 "1184701 0400913 || 7 *1546410 *1011905 “0298902 
*1615391 *1172542 0393126 || 72 *1545946 *1010909 "0298366 
*1611001 *1161553 ‘0386169 | 78 *1545496 *1009941 0297846 
“1606982 *1151573 ‘0379919 7 *1545058 *1009002 “0297341 
*1603280 *1142469 0374272 | 75 *1544632 *1008089 0296852 
“1599861 *1134131 ‘0369147 | 7 *1544218 *1007202 *0296376 
"1596694 “1126466 0364473 || 7 *1543814 *1006339 “0295914 
*1593752 ‘1119396 "0360195 || 7 “1543422 *1005499 *0295465 
*1591013 *1112855 "0356265 || 79 *1543039 *1004682 “0295029 
*1588455 *1106784 0352641 || 80 *1542666 *1003886 ‘0294604 
*1586062 *1101136 *0349290 81 *1542303 *1003111 *0294190 
*1583817 *1095867 ‘0346181 82 *1541948 *1002356 *0293788 
*1581708 ‘1090941 0343290 83 *1541603 *1001621 "0293396 
*1579723 “1086325 ‘0340595 84 *1541265 *1000903 *0293015 
*1577850 *1081991 ‘0338076 85 *1540936 *1000204 “0292643 
*1576082 *1077914 ‘0335717 || 86 "1540615 *0999521 *0292280 
*1574408 *1074072 ‘0333502 | 87 *1540301 -0998856 0291926 
*1572822 "1070444 0331420 | 88 *1539995 0998206 *0291582 
*1571317 *1067014 0329458 | 89 *1539695 ‘0997571 “0291245 
*1569887 ‘1063765 0327607 || 90 *1539403 -0996951 *0290917 
*1568527 “1060684 0325857 | 91 *1539116 0996346 *0290596 
*1567231 *1057759 ‘0324201 | 2 *1538837 0995754 *0290283 
*1565996 *1054976 0322630 | 93 *1538563 *0995175 *0289977 
"1564816 *1052328 0321139 | 94 *1538295 -0994610 -0289678 
*1563688 *1049803 0319722 || 96 *1538034 “0994057 *0289386 
*1562610 ‘1047393 ‘0318373 | 96 *1537777 0993516 “0289101 
*1561577 ‘1045091 ‘0317088 I 97 *1537526 0992987 ‘0288821 
*1560587 *1042890 ‘0315861 | 98 *1537280 0992469 "0288548 
*1559637 *1040783 ‘0314690 | 99 *1537040 -0991962 “0288281 
*1558725 *1038765 0313571 | 100 *1536804 *0991465 “0288020 
*1557849 *1036829 0312449 | 














ON GENERALISED TCHEBYCHEFF THEOREMS IN THE 
MATHEMATICAL THEORY OF STATISTICS. 


By KARL PEARSON, F.RS. 


(1) Single Variate. 


Let y= ¢ («) be any law of frequency and let the limits of the distribution be 
a and 6, then if NV be the total frequency, 


N=[" $(0)de, 


and if & be the mean value of the variate, 
Na -|- 58 Gk. 
Generally, if «4, be the sth moment-coefficient about the mean, 
; | : (a — 2) $(@) da. 


b 
Now consider bo = , | (a — %)* $ (a) da, 


and let e‘be any value of « — #, then 





pale = [ SS $ (0) dee 


Now pick out all the values for which «—Z is greater than e, and let us suppose 
b>a; then 





mie bl EEE onde 


af? 
and therefore prs /€* > WV | a ¢$ (x) da, 
since (% ~ @)/e is always greater than unity. 


1 fe 
But ot i ¢ (a) dx is the chance of an individual occurring with a deviation 
e+? 


greater than e from the mean=1—P where P is the chance of an individual 
occurring with a deviation less than «. Hence 


Meg 
P>1-%. 


Now let ¢ = Ao, where ¢ = Vy, is the standard deviation of the distribution. 
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Thus the chance of a deviation being of less magnitude than Aq is 


de ° a Ko0seles Ses eeewce ce ccseweeuseunene (i) 
If we put s=1, the chance P of a deviation less than Xa is limited by 
1 
ti > ] — 2 chores See vecceteerceeeseeeoeusesees (ii) 


This special case is Tchebycheff’s Theorem *. 


Inequality (i) gives our first generalisation for a single variate of Tchebycheff’s 
Theorem in (ii)t. We can now compare the accuracy of (i) and (ii) by supposing 
them applied to a normal distribution of frequency for the cases of deviations 
1, 2, 3 and 4 times the standard deviation. In this case 


te = (28 — 1) (28 — 3)... Lp. 











TABLE I. 
bes : 2s—1)(2s—3)...1 
Values of Lower Limit for P given by 1 mm. = a ‘ 
$ | vans A=2 | A=xB M=4 | 
pk eee | 5 ee ee ee 
1 ‘5556 | °7500 8889 ‘9375 
2 4074 | 8125 9630 ‘9883 
3 — 3169 "7656 ‘9794 9963 
4 — | -—— 9840 9984 
5 ee ta 9840 ‘9991 
| ae oe ee 9804 9994 
| as fo ee ae — 99950 
| s es Gee om ‘99953 
ERA Eile! lets - 99950 
ee fae as er oe 99940 
ern i — 
| Actual valos) egg, | | 0045 ‘9970 | 99994 


of P 

Clearly the maximum for any ) will be found by making (2s—1)/A? equa! to 

unity, or if A? = an odd number, s= 4 (A? + 1) and 4 (A? +1) —1 will give equal limits. 
If 4? be an even number then s = $2? will give the highest limit. 


* It was first proved in the Recueil des sciences mathématiques, T. u, according to Liouville, but I 
cannot trace this reference at all. It was translated from Russian into French in Liouville’s Journal de 
mathématiques, Vol. x1, pp. 177—184, Paris, 1867. The proof there given is somewhat lengthy and at 
first sight the result might appear more general than (ii); but this is not so. Assume r=u+v+w+... 
and suppose u, v, w uncorrelated, so that o,2=¢,72+0,7+0¢,7+... then we have with minor differences 
of notation and terminology (especially the use of the words ‘‘mathematical expectation” for our 
moments) Tchebycheff’s own phrasing of his theorem. The remark of Dr Anderson (Biometrika, 
Vol. x, p. 269) with regard to the neglect of the theory of ‘‘mathematical expectation” by the 
English statistical school seems based on a misunderstanding of the moment method. 

+ This generalised form of Tchebycheff’s Theorem was given by me in a paper for the Honours 
degree of the University of London in Statistics, October, 1915. 
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(2) Two Variates; Limit to the Frequency within an Elliptic Area round the 
Mean as Centre. 

Let the law of frequency be z= (a, y) and let the standard deviations of 
and y be o,, a2, and r be the coefficient of correlation between # and y. Let us take 
as our ellipse, 

1 
l-r 


a and y being measured as deviations from the mean. 





a 
(4 o; ~ 2612 —- + Ax z) sd x*; 


G02 


Then by giving special values to 0, O12, Q. and y* we can get any ellipse we 
please. Further since the curve is to be an ellipse 7°0,,? < 6,,6..* and we shall take 
6,, and 6. always positive. Thus. y* and all its powers will invariably be positive. 


Now consider; if V =[[¢ (ay) dady, 
nd lfocnt tale 


the integration extending all over space covered by the frequency surface. Divide 
both sides by x,%, 


a 


o; 


— 20, 





ray Yr) . 
rn co, cot, 


due [loco (8) et 


Take out all the values for which x is greater than x, then 


Fr. 1 x 28 
x >W [¢ (ay) (x) dady, 
when the integral extends over the area for which y is > y%. Hence 


ce 
go> x] $y) dedy 
> chance of an observation falling outside the 
ellipse yp. 


Let P be the chance of an observation falling inside this ellipse, then we have 
at once 


Now we define 


Pu = 3 || (ey) ea (y — OY deedy 


=* | oy) ay” dedy 


in our case, as the s, s‘th product moment-coefficient about the mean. And it is 
very convenient to write 

et ORI) ccceiens «sin ccstaveesstenetonnes (iv) 
and term q,y a reduced product moment-coefficient. 


* We shall generally wish to have symmetry of expression between wand y, and in this case we take 
622 = 04, =98 say and write 6,.7/@=p and we shall have as necessary condition for the ellipse p<1. 














It is clear that by simple expansion of the trinomial expression, we can always 
find J, in terms of ggy. 


We have accordingly to study the expansion of 
1 
a - (0,27 — 2rO,.xy + Oy") 
s! 
(s—u—m)!m! u! 


uUu=8 M=S—-U 
¥ 


SE | enn. if 
(1 — ry u=0 m=0 





\- 1 i Qu yu 6," 6,,2-"— 6,5 sum i ; 


and if this value be substituted in the integral expression for J, we find 


= 1 “y ay ay — ] wou, u 8—m—u m ¢ Sold eo 
‘“(l-ry og Vere { Lae" On Pus (s—u—m)!m! = foam, meh 
Saeek (v). 
The lower values can be equally readily found by the expansion of 
2 2\8 
(4. 2a — 20,7 2 + O29 y) 
o; O10, Oo; 
in powers of r by aid of the binomial. 
The first few cases are 
1 
a = iT. r) {01 Goo “— 26.07 Qu a 92. ea}, 
1 2 
I,= a ry {O:°qu0 a Ix. Jos a 201; O22 Gan — 46,7 (CAVE = Ox09:s) + 4,217 G9}, 
1 
I,= a—ry {Ox Geo — Oxo! Goa + 30), Ars (Ar Qa2 5h 22 Qou) 
= 60,27 CEs = x0? Qs at 261; A00s3) 
+ 120,,27° (Ox Ge + O22 4os) = 84,31? qs3} 
1 > » 9 
I,= a ae ryt {O11 Geo 1 Pex Gos + 40; 8.0 (91: qee + Ox. qos) + 60,7 Oso? Qua 
— 86,7 (A.°¢n 1 34, Bun (Oy Qs3 + x25) AE 8x05 G7) 
+ 246,,?r* (11° Gee ar Box? Jag + 26,, Boss) 
— 326,27" (O11 Gss + 20.35) + 167400‘ gaa} eeenes (vi). 


These expressions simplify for various cases, but it is clear that for the general 
case of unknown type of distribution we shall have to find very high product moments 
from the observations in order to use our generalised Tchebycheff’s Theorem. 
Otherwise we shall have to make assumptions as to the relations between high order 
and low order q’s. 


Since generally ga = qo. = 1 and g,=7, we have 


Lan 1a (6. + Ox: — 26,97°), 


1 


This suggests that for all cases we are likely to get simplified results, if we take 
61, = Ox. = 8,2 = 1 when we find J,=2. In other words, simplification arises if we make 
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our ellipse that of the normal contours, although of course for the general case this 
will not be a contour of equal probability, although it may roughly approximate to it. 


Thus we find for this case, 
I, - 2, 
1 
I, rid qa ry 


1 
I,= ad-ry {Geo + Yoo + 3 (se 8 Jo) — 6r (In + Gis + 2433) + 12r? (Qa 32 Qos) = 87° qs}, 


{440 + Gu + 2420 —4r (qa + qis) + 47° 420}; 


api {Yon + Gos + 4 (Yoo + Yon) + BGs — 82 (Gn + Gir + BGs + 3455) 
+ 24r? (Yee + Y6 + 244s) — 32r° 953 +— Yas) + 16r* gus} cocceccce (vii), 





and the general value of J, will be 


I 1 “7 m= s- oy Ly * Quy 
‘“da- r) 4=0 = - ) " @-u—m)i mia! u. 


8: i 


1 Y2s—u-2m, amie : 


For the case of a normal distribution the q’s are all given in terms of r 
(Biometrika, Vol. x11, p. 87) and on substitution we find 


I, =2, I, = 8, I, = 48, I, = 384; 
generally J, = 2s (2s — 2) (2s — 4)... 2, which can be shown directly, thus: 


I, = [[¢ (x, y) — (5- be +° ry dady 


Co, G6, 


= } e mM eixdy = 2sI_1, 


if we integrate by parts, 
= 2s (2s -2)(28— 4)...2x [eM ydy 
= 2s (2s — 2) (2s — 4)...2 

Accordingly our generalised Tchebycheff’s limit becomes 


2s (2s — 2) (2s — 4)... 2 re 
P>1-- bcd A adn ee ls (vili)* 
Xo" 


and our best value of s will be determinable from 2s < y,, or s must be the greatest 
integer less than or the integer equal to }y,’. 


Now the actual volume of the frequency surface inside the contour 
P ] a ray  y*\ 
Xo = oT 2% 


L—\o7 G10, Ge) 


is known to be 1—e *, and it is thus easy to test the present generalised 


Tchebycheff limit as applied to this case. 
* This result is almost at once extensible to any number of variates following the normal distri 


bution, but as the actual value of the probability is known there is no value in writing down this 
limiting value. 
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TABLE II. 


Generalised Tchebycheff Limit applied to the Probability that an association of two 
variables lies inside a given contour x,* of a normal frequency surface. 











2 Actual Minimum value | 

Xo Probability of P | 

| 

4 “8647 ‘5000 (J;) | 

5 ‘9179 “6800 (Jo) 
6 “9502 7778 = (de) 
7 “9698 8600 = (J;) 
8 ‘9817 9062 (Js) 
9 “9889 9415 (h) 
10 "9933 9616 (J,) 
12 ‘9975 9846 = (J;) 
14 “9991 $939 «= (Jg) 
16 ‘9997 9976 = (J;) 
18 “9999 9991 (Js) 
20 99995 99964 (Js) 














Here as in the case of a single variate the generalised Tchebycheff limit is not 
very useful for low values of y,?. But if in any particular type of observation we 
consider it desirable to look with suspicion on an observation which has occurred 
and yet the odds against which are greater than 50 to 1, the Tchebycheff limit may 
be of value. As illustration, suppose two variates are correlated with intensity °7, 
what suspicion should we cast on an observation which gave the deviation of one 
variate 3°8 times its standard deviation and of the other 3:2 times? Here 


1 a Wray ¥ 
Xo" = @ (= = == She ¥) 
1-r 


cy He, Be, 


= ai {(3'8)* — 1:4 (3:8) (3-2) + (3:2)"} 
= 15°01, or say 15. 


(71). 
Then P>1- iB? > 9962, 


or the odds are greater than 250 to 1 against it. Actually the probability of the 
occurrence of anything as unusual as or more unusual than this is ‘9994, or the 
actual odds 1700 to 1 about. For many purposes the odds of 250 to 1 would 
amply suffice to mark suspicion, although of course in the case of normal fre- 
quency it would be as easy or even easier to calculate the real probability as the 
generalised Tchebycheff limit. 


The chief interest of the investigation thus far is to show that unless we use an 
I, of a high order the Tchebycheff limit is unlikely to be of very much service. We 
can obtain it in the case of material following a normal distribution, but then we 
-know the exact result and do not need it ! 
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I have considered very carefully ‘the possibilities of deducing higher q’s from 
lower q's for non-normal systems on various hypotheses as to the nature of the 
regression and the scedasticity. The simplest hypothesis is to suppose linearity of 
regression, homoscedasticity and homocliticity of both sets of arrays. 


Let Bos» = S (a) /o*, and Bos—/VB, = 8S (a%*+1)/o"h 


as usual; let a single dash mark the A's for the y variate, and double dashes the 
f’s for the y arrays of «’s and triple dashes the 4’s for the x arrays of y’s. Then 
if 7, be the mean of the a-array of y’s, 


— 1 
VBI = 778 (yDlod = 2S (e+ ¥ Flee, 


where y’ is measured from the mean of the array, S is the sum for all members of 


the array and > the sum for all arrays. Thus if 7,= “Ete be the regression line, 
1 





eek “+ 31 
N A o; Ny Oe oe tec tos ; 


VB) = 3 {n Se) 9 gee eg, SES) BE aca) 


since sg), S Cy") is to be the same. for every array. Thus 


x Ny 
VB = VB, + VB" 1-98, 
or VB,” = V By — VB, Bee es eae (ix). 
(1 ox 2)? 
Similarly VB,” = ee Savhetedh Hah ata (x). 


Thus it is impossible in homoclitic systems for the skewness of the arrays to be 
equal to the skewness of the marginal! totals if there be correlation *. 


Again we have 
ge. all 1 ro Pg 
Bi = HS (Wlot = 778 (Za +¥) [os 
= r*B, + Gr? (1 — 7%) + By” (1 — ry, 
_ By — 7B, — 6r? (1 — 1°) 








or 8.” al 2 ry ) 

: mt _ BJ -—3—1'(8,— 3) . 
or, again, Bo” -3=- Saeee es seeseeeeneeeeeeeeees (x1), 
and similarly, 8.” -3= ome oe , Re (sid) 


* We note that if the marginal totals be both without skewness, all the arrays will also be symme- 
trical. Equations (xi) and (xii) show us that if the marginal totals be mesokurtic the arrays will also 
be mesokurtic. 
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Now consider gu, 
1 \ 
(a=F S (ay')/(oo N=4 > Sa? (“Sety) /otes 


= B,r* + (1-7) a 
= Bir+(1— a eevnceoecevoeccuscnsanceceqccoscecveseneesooeced (xiii) 


by symmetry. Hence it follows that in linear homoscedastic systems 8, =, and 
accordingly 
m 1+r , 
B,” -3 =p,” -3 =(B,— 3)7— Sige (xiv). 


This is of interest as indicating that in linear homoscedastic systems the one with 
mesokurtic margins is the only one in which the kurtosis of the arrays can be the 
same as that of the eke 


Again == wSl@y)ores) = == 7 Sat (2 —a+y ‘ ‘/atos 
, >> “oer 1 Nyt y® 
= rB.+ s(X 5) +2 (=) 8 (55) 
= 7B, + 3r(1- - Bot VB VBy KP By..cccccccscccccsscesenseses (xv) 
= 9B, + Sr (1 — 1°) By’ + V By VB, — TP By on .eccccesssecceeeeeeees (xvi) 


by symmetry. 
It follows from (xv) and (xvi) that it is needful for 
Pi TO we TE — Ta, . ccc cieeceseceuccecssenoeeebes (xvii). 


Finally we have 


qu= 7S (etyM(a;So¥) 


= 7 2Sat (rs 78 4 ¥) /otos 
="p Bet 6r° (1 ie 7) Bu- — 47°B, aE 48, VBY 1B; IB, me BB,” (l- -ry, 
or 
qa = Bs, + 6r° (1 aed 7) B, m3 4r* Bs a 48, VBi/B, — mB? — 6r* (1 <4 7) B, 5 B. Be’ 
= 1B,‘ + Gr? (1 — 1°) B, - 47°B; + 48, VB,/B,' — 1B, — 6r? (1 — 1°) By + Bs Bs 
sahil (xviii), 
which again involves the complicated 8-relation 


1 (Bs — Br’) + 6r? (1 — 1°) (By — By’) — 47? (By — Bs’) + 4 (BsB,’ — BBs) VB: = 0 


It is difficult to see how the form of variation of one character can be related by 
the correlation between that and another character to the form of variation of the 
second character as (xix) would indicate. If it were we should get into great 
difficulties in dealing with similar conditions to (xix) for a large number of characters 
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with different correlations. If as it appears to me (xix) would need to be satisfied 
independently of r, then we must have 


B. — 68, = Bs — 68, 

Dig Be OF Fone ccsccncacsscvccessocese (xx). 

B;/B, = B;'/By 

The second of (xx) by aid of (xvii) leads us to 
2 B; F 2 B; 
38:(1— 53) = 38 (1-530): 

whence f, = §,’, and as 8,= 8,’ it follows that 8;= 8; 8B, =8,, Bs =B,, that is to 
say the total frequencies of the two correlated characters must possess variation 
practically of the same type. 


Now I find this is very far from being the case in distributions which differ 
widely from the normal correlation surface. Thus it follows that the hypothesis of 
homoscedasticity, linear regression and homocliticity fails for such cases. I therefore 
modified the linear regression and adopted skew regression, homoscedasticity and 
homocliticity. I again got relations between the §’s, but of a much higher degree 
of complexity. These were tested by Mr A. W. Young and myself on the skew 
correlation surfaces of barometric data, but were found to fail. Direct investigation 
afterwards showed me that while the regression differed to some extent from 
linearity, it was the homoscedasticity which was in the first place the erroneous 
assumption. The arrays were yery far from having the same standard deviations. 


Until therefore some theoretical advance is made in the investigation of skew 
regression surfaces, especially. for those which have linear or nearly linear regression 
combined with heteroscedasticity, it is unlikely that we shall have any adequate 
method of determining high product moment-coefficients from.low ones. We are 
accordingly thrown back on direct determination of the high product moment- 
coefficients, if we wish to determine a Tchebycheff limit. The work of determining 
I, would involve a whole round of 8th order moment-coefficients and product 
moment-coefficients. It would then give us a limit of the order ‘95 for ‘99. Lower 
order J’s would hardly give values of much importance, and it may be questioned 
whether a rough limit of the kind required could not be better obtained by inserting 
the desired contour on a “scatter diagram” and simply counting the dots which 
fall outside it, or indeed by taking the best fitting normal surface to the actual 
distribution. The reader may question whether something better could not be 
achieved for skew correlation Tchebycheff limits by some contour other than the 
ellipse. This would undoubtedly be the case, if we knew. the forms of the skew- 
correlation contours, for then we should undoubtedly choose this equi-probable locus 
for our boundary. But as we have only a knowledge of these empirically—experience 
shows them to be frequently pear or lemniscate loop shaped—we get little help for 
our present problem. 


One other aspect of the matter may be briefly considered. We muy find a limit 
to the probability that an event or individual will lie within a circle of radius R 
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round the origin. This corresponds to Schols’ Problem*. It may be useful to 
have a Tchebycheff limit for this case, although we have yet to meet the particular 
instance in practical statistics where it would be of marked advantage. 

We can best investigate.this problem de novo. 


Let I,= [[e + ¥°) p (xy) dady. 
Then.-if R be any radius round the origin, 


1,8 = |] (“ay # (ey) dedy, 


the integral being taken to include the whole volume of the probability surface 
z=¢(a,y). Now pick out those elements of the integral for which 2° + y’ is 
> R?, then 


I,/R* > |] ( - a r) ¢ (xy) dady, 


where the integration extends over s piso elements only, and is 
therefore 


> | | $ (ay) dady, 


but this integral is 1 — P, where P is the probability that the individual falls within 
the distance R of the origin. Thus the Tchebycheff limit is given by 





q, 
P>1- Re 
Now clearly we have 
1,=|[(@+y¥)$ (ey) dedy 
s(s—1) 
= Poz,3 + SPog—2,2 + 1.2 Prs—s,at -- 
s(s—1) 


=o," 28,0 + SO; id Qes—2,2 + - 1.2 Oe oS duis + --- 


Now write R= ¥Vo,? + a2, and further take tan @=o,/o,. Then 


I. 1 ‘ s(s—1 
Re 25 {eos 9 Gas,0 + 8 coS*—? 8 sin? OGos-o,2 + 


8(s — 1)(s-- 2) 
1.2.3 
For the particular case in which s= 


) cos*— 6 sin* Oqas_4, 4 





+ cos*—* @ sin® 025-6 + | Beate 


ee xi (cos? 6 + sin? #) = = Shien 
For s = 2, as = (cos‘ 08, + 2 cos? @ sin? @g.. + sin‘ @8;’). 


* Over de Theorie der Fouten in Ruimte en in het platte Vlak, Verhandlingen der K. Akademie van 
Wetenschapen, Deel xv, pp. 1—68, Amsterdam, 1875. Translated into French in the Annales de I’ Ecole 
polytechnique de Delft, Tome 11, pp. 123—178. Leide, 1886. 

+ It is conceivable that the solutions given might be serviceable in the case of testing machine guns 
against a target. 
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Now a good approximation to ga by (xiii) must be 4(8,+,)7°+(1—7); 
hence substituting 


F2 = 3 {(G.—8) (cost 0 + sin? 8) + (B,' — 8) (sint 0 + r* c0s*@) 
+3 —4(1 — 1°) cos’ @ sin? }....... 


For the special case of normal distribution, if we write «* = 4 (1 —1r*) cos? @ sin? 0, 


Rs = oT (3 = x?) eeeeees 
Again 
3 = = {cos* 88, + 3 cos* @ sin? @ (cos* Og, + sin? 0g,,) + sin’ OB,} ..-.-. 


and for a normal distribution, 


Further general cases can be at once written down, but it will suffice to give 
here the leading vaiues« J, for a normal distribution : 


3 I, I, 


R-y Rm = (8— x’), 3 = 5 (15 — 9e'), 
I, 1 aos ies a bie ye me 

Fee = 50 (10 — 90x? + 9x‘), Rv = =5( 5 — 1050x? + 225x*), 

= ra (10895 — 14175? + 47254 — 225«4), 

fie oa (136, 135 — 218,295? + 99,225x‘ — 11,025x°), 

a5 xis (2,027,025 — 3,783,780x* + 2,182,950«! — 396;900x* + 11,0254") ....... 


The following table gives the maximum Tchebycheff limit for the probability of 
an individual falling within the circle \ /¢,?+ 2 for various values of 


= 4(1 — 1°) o%02/(0; + 027). 


(I,) denotes the particular J from which the maximum limit is found. (J, ?) 
denotes that the corresponding numerical value is a Tchebycheff limit found from 
I,, but it 13 not known whether J, would not give a higher value, J, not having 
been tabled. The second part of the table provides the values of J, from which 
the first part has been computed. They may be useful in the determination of the 
Tchebycheff limits for other values of 2. 
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I. Generalised Tchebycheff Limit for Schols’ Problem with a Normal Distribution. 


Radius of circle =AVof+02, =4(1—9r)o202/(0%+ 





| 





Values of «?. 


moO90099909 





























jal 2 1-25 15 20 | 2 8 85 40 
| 

0| 0 (4) | 36 (Z,) | °5556 (J,) | 8125 (J) | ‘9386 (I) | 98400 (J,=J,)| 996924 (J,) | 999528 (J, 
1| 0 (A) | -36 (4) | -5556 (Z,) | 81875 (Ue) ‘9422 (J,) | -98574 (Js) ; ‘997329 8 -999611 rN 
2} 0.(1h) | °36 (4) | 5556 (44) | 8275 (J) | “9459 (J) | “98740 (Js) 997707 (Ig) | 999685 ( 

3/0 (1) | 36 (,) | 5556 (7,) | 83125 (J) —_| 9496 (2) | -98899 (7,) 998109 (J;) | 999749 (Je?) 
*4| 0 (1) | -36 (J,) | 5556 (Z,)'| 8375 (U2) | 9538 (4) | “99099 (J5) ‘998478 (J;) | 999805 (J,%) 
‘5 | 0 (Ay) | -36 (J) | 5556 (Z,) | 84375 (Ze) | 9592 (J) | 99193 (Js) ‘998806 (J;) | ‘999853 ra 
6| 0(Z,) | 36 (2) | -5556 (Z,) | -8500 = h) ‘9645 (I,) | 99333 (J,) ‘999096 (J,) | ‘999893 (J?) 
7/0 (4) | 36 (,)| 5556 (J,) | 8641 (J,) “9696 (J,) | °99490 (Je) ‘999380 (Jz) | ‘999927 (J,?) 
8| 0 (1,) | 36 (44) | 5654 (J,) | 8781 (ZL) 9746 (J,) | -99631 (J,) ; ) | 999954 (J,?) 
‘9| 0 (1,) | -36 (J) | 5852 (4) | 8922 P ) “9809 (J,) | 99770 (Jz) ‘999788 (J) | 999975 (Ig?) 
0} 0 (4;) | 36 (7;)| 6049 (Za) | 90625 =I) “9879 (Ie) ‘99895 (J) ‘999620 (J,?)! 999991 at) 

| 








II. Values of the functions I, forming the denominator of the Tchebycheff Limit to 
the probability that an Individual will fall for the case of Normal Bi-variate 
Frequency within a given circle of radius VoP+o.. 





| | j f 
x2 | qi, In Iz I; Is Ig I; Ig | 
































| 

—_} - Sard, Dade eer sno: en ELS 5, : | 
0-0 | 1 | 30 | 15°0 | 105-00 | 945-00 | 10,395°000 | 135,135-000 | 2,027,025-0000 
oO1' 1 | 29 | 141 | 96-09 | 842°25 | 9,024525 | 114,286°725 | 1,670,080°7025 
02| 1 | 28 | 13-2 | 87°36 | 744°00 | 7,747200 | 95,356800 | 1,354,429°4400 
03] 1 | 27 | 123 | 78°81 | 650°25 | 6561675 | 78,279°075 | 1,077,729°5025 
04) 1 | 26! 11-4 | 70:44 | 561-00 | 5,466600 | 62,987-400 |  837,665°6400 
0+ | 1 | 25 | 10° | 62°25 | 47625 | 4,460625 | 49,415°625 |  631,949°0625 
06 | 1 | 24 | 96 | 54-24 | 396-00 | 3,542400 | 37,497°600 |  458,317-4400 
07 1 | 23 | 87 | 46-41 | 32025 | 2710575 | 27,167175 | 314,534-9025 
08 | 1 | 22 | 7-8 | 38-76 | 249-00 | 1,963:800 | 18,358:200 |  198,392'0400 
09} 1 | 21 | 69 | 31-29 | 182-25 | 1,300°725 | 11,004-525 107,705-9025 
10 | 1 | 20 | 60 | 24-00 | 120-00 740-000 5,040°000 40,320-0000 





The reader may be curious to know whether the Tchebycheff limit gives 
a better result for Schols’ circles than for the elliptic contours. The actual pro- 
bability of an individual falling within the circle of radius \ Vo, + a? is given by 
r2 
- a (1 -«’ cos 6) 


=s 
P=1-5/ war 


where « =V1—<4 and @=4(1 — r*) ¢;°0,3/(¢;" + 2,7) as before. 

I have not succeeded in finding any rapidly converging expansion for this 
expression *, and have been reduced to evaluating its argument and usinga quadrature 
formula. Thus for \ = 2, «* =°4, I find 

P = -963,3694. 


* Unfortunately Schols has not tabled P, but only gives the values of for ten values of x’, which 
occur when P=1/2, i.e. radial. values for generalised ‘‘ probable errors.” 
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The process is not as long as it might seem. Indeed if we only need four decimal 
places, it is quite adequate to integrate only through the first quadrant, the second 
contributes nothing of importance. The value given by the last Tchebycheff limit is 


P >°8375. 


This is of the same order of divergence as we found for the elliptic contour, ie. for 
x. = 7, we had P = ‘9698, with a Tchebycheff limit P > ‘8600. Thus the measure 
of approach does not seem very close in this case until we reach higher values of X. 


On the whole we must express disappointment at the results of the Tchebycheff 
process. We had found Tchebycheff’s own limit based only on the second moment 
of small practical value, although it is to be found occupying a prominent position 
in many continental works on probability. By extending it to higher moments and 
product-moments we have reached results which are great improvements on the 
original Tchebycheff limit, but the method still lacks the degree of approximation 
(except for probabilities over ‘99,say) which would make the result of real value in 
practical statistics. It is, however, conceivable that some more ingenious application 
of Tchebycheff’s idea may lead to a limit more close to the actual value of the 
probability. 

















CHARLES B. GORING, 1870—1919. 


“His work won full recognition from those who value scientific research. But 
it is a strange commentary on the Civil Service, that, when so pressing a problem 
as prison reform still confronts us, so fine a worker and so human a man should 
have been given but the (medical) administration of a great prison instead of being 
called in to deal with a work for which all his gifts supremely fitted him.” The 
Nation. 


The late Charles B. Goring, M.D., was a distinguished student of University 
College, London, and afterwards a Fellow of that College. During his career his 
studies were far from confined to medicine: he was much interested in literature 
and philosophy, being awarded the John Stuart Mill Studentship in Philosophy of 
Mind and Logic in 1893, probably the only occasion on which that studentship 
has fallen to a medical exhibitioner*. It was not therefore surprising to taose 
who knew something of the remarkable powers of sympathy, the width of interests 
and the facility of expression which characterised Goring to find that he would 
write a blue-book, as no blue-book has been written since the time of Matthew Arnold. 
He would handle facts, but at the same time he would appeal by his imagination 
and gift of language not only to the sociologist but to every man who is fascinated 
by the human spirit in all its diverse phases. Goring lived with his criminals, and 
studied them in and out of prison as the naturalist studies life in the field, and as 
the humanist studies mankind in its thronged resorts. Ask Goring what a convict’s 
mind was like and he replied unhesitatingly: Like yours and mine. The same 
delicate spirit of sympathy that went out to his friends in both the joy and the 
sorrow of life, drew the criminal to him, and the link often grew so close that the 
prison medical officer became the father-confessor: the psychology of the eriminal 
mind was laid bare, and thus Goring’s insight into criminality, its source and its 
motives, grew deeper and more and more coordinated as the years of service increased. 
Yet he never hesitated to exhibit the same tender sympathy alike to each new 
sojourner and to each oft returning old prison inmate, while his own nature widened 
and strengthened under an environment which appears to dull the mentality of so 
many men in the prison service. Only last Christmas the present writer dis- 
cussed with him the possibility of a series of essays on the psychology of crime to 
be based indeed on facts acquired by scientific study, but to exhibit a structure 
from which the scaffolding should have been stript, and which should convince 
the beholder of the fitness of its purpose solely by the beauty and truth of its 
lines. The path to truth is an arduous one, but when we have reached the 


* Goring was awarded the Weldon Medal and premium by the University of Oxford in 1914 and 
never will a more fitting award of that medal be made; his work ‘‘ The English Convict ” was undoubtedly 
the finest contribution to biometry of its quinquennium. 
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summit we know by the width of our prospect into all neighbouring spheres that 
we have attained it: 


“Qui veram habet ideam, simul scit se veram habere ideam, nec de rei 
veritate potest dubitare.” 


We may now have to await that work for generations until another prison medical 
officer arises with Goring’s scientific knowledge, discriminative sympathy and fine 
power of expression. Battling with a gaol epidemic of influenza, when he should 
himself have been in bed, Goring fell an easy prey to pneumonia, which a strong 
will coupled with a spare and delicate frame cannot resist as their combination so 
often does many of Death’s onsets. Goring died as he himself and his friends 
would have wished, doing his duty to the last at his post. His work was uncompleted 
as good men’s work so often must be. He was studying at the time the influence 
of the war on the nature and frequency of crime—a subject on which much will no 
doubt be said, but most probably with small scientific basis. How shall we estimate 
his work, now that he has left us? We pass by the criticisms of men inside and 
outside the prison service, for they will leave neither in their own productions nor 
in their criticisms anything that will remain of permanent value to the new science 
of criminology as Goring outlined it ; those who have had like experience lack either 
his insight, or his logical mentality, or his power of expression. They were not 
trained in the same school, nor had they the penetralia mentis, or rather what 
“the Romans called ingenium,” which through its very innateness carries mankind 
onward a step, assured, not doubtful or to be retraced. The contest between 
mediocrity and inspiration is as old as history and the creator, the poet, wins, if 
not in life, yet thereafter. The world has yet to realise that achievement in every 
field is the product of trained imagination alone. Truth in science as in art is not 
the product cf mere computation or careful observation, but of these guided by 
fertility of imagiaation. The creative mind has the potentiality of poet, artist 
and scientist within its grasp, and Goring’s friends were never very certain in 
which category to place him. Perhaps the specification was as difficult and would 
be as unprofitable as it must ever be in the case of the Florentine, the master 
spirit of this type of mind. 


To the present writer fell the good fortune to be in close touch with Goring 
(and his keen co-worker, H. E. Soper) for that long period of two and a half years 
during which “The English Convict” was in process of creation. He observed 
Goring in times of difficulty when the intertwined skein would not unravel, and in 
times of achievement when the tangle Joosened as by magic. He realised the 
quiet persistency with which Goring grappled with the most intricate problems 
and the gentle satisfaction he exhibited when assimilating and recording a new and 
striking point. When finally the great manuscript had gone to press, we who had 
been working alongside him at our own tasks knew one and all that while we were 
losing a cherished daily intimacy, we had still individually gained a life-long friend. 
We felt that had the world been rightly organised—which it ever fails to be— 
a post in our midst would have been found available for Charles Goring, for no 
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man was better fitted than he to “study those agencies under social control that 
may improve or impair the racial qualities of future generations, either physically 
or mentally”; none we had come across was so well suited to make knowledge 
reached by scientific research a factor of social progress. He knew how to clothe 
scientific results in a garb which captivated the mental eye of him who listened to 
his spoken or read his written words. Goring was intended by nature for a master- 
craftsman of exposition. His sceptical spirit demanding a rigid foundation for 
truth was combined with an unlimited enthusiasm that truth when known should 
be proclaimed to the many. Yet in his own life, “Thrones, powers, dominions 
blocked the view, with episodes and underlings.” 


What then is the outcome of Goring’s work? Has he decreased crime or 
bettered the lot of the criminal? Not directly, the solitary individual can achieve 
little in this sense ; he has moved stones from the path of the outcast, and we can 
picture many a criminal who would have wished to stand by his graveside. Has 
he pointed out the lines upon which the state in future should deal with its 
defaulters? Again not directly, but only indirectly. What then has he achieved ? 
He has given us a portrait of the criminal as he really exists; he has painted 
in the nature of his physique, he has indicated his facial and underlying mental 
traits, his hereditary tendencies and his home associations. And he has made for 
ever atypical the criminal of current drama and novelistic literature. Here it is 
that literature owes a deep debt to Goring. It cannot survive without its villains 
but the individual writer will never be as intimate as Goring was with poisoner, 
murderer and spy. Yet if that writer approaches with intuition not the masses 
of statistical data, but the text of Goring’s life-work, even in its recently issued 
abridgement™*, he will learn to see the criminal as Goring saw him, he will learn to 
know the real man and his attitude to crime. He will learn that Goring was a 
creator in the literary senset, and with imagination stirred he will feel the 
impulse to adopt and adapt that realistic portrait of the criminal as only true art 
can do. Through literature the world at large will know at last what crime and 
the criminal really are. Not only will literature profit, but the world which 
easily grasps truth when depicted by art will understand and gain something of 
the spirit of the man whose life’s work alas! is embraced within the livid wrappers 
of a government publication. 


“En mands gerning er hans sjael, og sin gerning skal blive ved at leve pa 
jorden.”—The work of a man is his soul, and on earth his work shall not perish. 


* «The English Convict” (Abridgement), Wymans & Co., 1915. 

+ The present writer has many sins to atone for, but perhaps none he regrets now more than the 
stringency with which he docked the original MS. of Charles Goring of many of its literary qualities as 
unsuited to a scientific and government publication. 


B. 2. 
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APPRECIATIONS OF CHARLES GORING. 


To the readers of Biometrika the following sympathetic accounts of the personality 
of Charles Goring will appeal as they do to the Editor, who deeply values the 
privilege of being allowed to publish these very intimate characterisations. The 
first is by Mr E. V. Lucas, a college friend of Goring’s; they both belonged to one 
of those periods of keen intellectual activity which arise occasionally in college life 
owing partly to the action of waves of external thought, but more often to the 
presence internally of one or two original minds. Outwardly the period in question 
was marked by the foundation of the Students’ Union and the meteoric brilliancy 
of The Privateer—a college journal that one did not grudge purchasing. It was 
for Goring the moulding time,—the golden days, when there was leisure to think, 
interpolated between an uncongenial office experience and the wider but none the 
less toilsome experience of a medical officer ona hospital ship during the South 
African War. 


The second appreciation is the oration bravely spoken over his grave by his 
widow. I have not ventured to leave out a sentence of it. Round the grave were 
gathered the friends of his creative period, the friends of his youth, the friends of 
his prison calling, from prison commissioner to warder, and a scattering of humbler 
friends unknown to most of us, but none the less there out of love to one of 
the finer spirits of this life. That brilliant June day, with its unique ritual, 
when we paid the last respects to Charles Goring, will remain in the memory of 
those present as unique as the nature of the man, who in leaving us reduces still 
further that little school of trained biometricians, who value humanism as well as 
science. 


I. CHARLES B. GORING As A STUDENT. 


I have been asked to write a few words about Charles Goring, and I have 
tried, because I respect the asker; but they will be incomplete because I have 
seen Goring of late so little and hardly knew him in maturity at all: as a husband, 
and a father, and an intellectual force with all his powers at their richest. But of 
the Charles whom, in the eighteen nineties, we knew, the Charles whom we loved, 
my impressions are fresh and will always be. His personality provided for that. 

I say “ whom we loved,” but I think we did more than love. I think that if it 
were possible, if it were conceivable, that any harm should be coming to him, 
there is nothing we would not have done to interpose our own inferior bodies 
between him and it. For he inspired not only affection but protectiveness. We 
felt that we were his guardians: his—in a very peculiar sense—owners. Not that 
he lacked any qualities of self-defence. Far from it. His mind was crystal clear, 
his attitude to life and its problems was fearless; but he had an unworldliness, 
a childlike radiance, that seemed to demand from his friends a contribution of 
cotton wool. Let me say again that he did not need this, but we all wanted to 
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provide it. I have said that his attitude to life and its problems was fearless. 
But it was more than that: it was challenging and ardent. Had there been nothing 
to probe and inquire into, he would not have been the happy man he was; for he 
was a born inquirer—inquisitor even—and mistrusted all traditional face-values. 


Exactly how I came to be admitted to Goring’s circle I never understood then, 
and cannot now fathom. Because where he and his friends brought to their dis- 
cussions and disputations knowledge and seriousness, I had nothing but instinct 
and impatience. But they suffered me, and I was permitted to sit on the outskirts 
and listen, and now and then to interrupt. What I chiefly remember of those 
evenings—at all kinds of places—at Highgate, at Hampstead, in rooms near the 
Museum, on the boat to Margate, on the Broads,—what I chiefly remember is 
Charles in argument : eager, stimulating, vivid, humorous, always gently reasonable 
and never losing sight of the main proposition. I suppose he was the honestest 
and most understandingly tolerant man that ever lived. He never trimmed; he 
rarely condemned ; and he had no fear. No fact was too stark and naked for him ; 
indeed, what he wanted was stark and naked facts. We would all have our say— 
some of us solid and some of us fluid—and then he would deal with us, with quiet 
Socratic questionings ; and all the while we would see, burning within his beautiful 
workmanlike brain, the soft steady flame of that lamp of enthusiasm which was 
never to be dimmed until a few weeks ago it was all too soon extinguished : 
enthusiasm for the truth, wherever found. 


Of what dark passages that lamp was to illumine it is not for me to speak. 
There are others who have authority. But that no sweeter nature was ever allied 
to a passion for scientific investigation I feel myself to have the right to affirm. 


E. V. Lucas. 


June 17, 1919. 
II. CHARLES GorRING AS HUMANIST. 


In asking you all to come here today, I have done what seems to me a right 
thing to do, and a beautiful one: for, with your presence, I have made a circle 
round my husband’s spirit of those minds and hearts most intimate with his, and 
most valued by him....You all loved him; he loved everyone of you. With each 
one of you he had a separate and private friendship....It seems to me that I can do 
him no greater honour on this day than to give him what you have let me give him 
by coming here—your undivided thought of him, your clear memory, and the warm 
and poignant tenderness that I well know possesses each heart here at the very 
mention of his name—Charles Goring. 


I must ask you to forgive me if I read from this paper what I have to say, 
instead of speaking it, in a more natural manner. I should not find any difficulty 
in speaking to each one of you separately. It seems absurd that, simply because 
you are all here together, in a number, that I should find it difficult. Yet so it is. 
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And, therefore, for this reason, and also because on this occasion I can trust neither 
my memory, nor my self-control, I hope you will forget—won’t even see—this bit 
of paper between us. 


I want to say, first, why I am speaking at all. There are two reasons. One is 
that I want to say something about my husband which may, perhaps, for a few 
instants, trace an outline of him upon the air, for you as well as for me—which 
may, for a moment, mark out his features for us, give us a glimmer of himself. 
That is one reason. The other is that I want to make, at his grave-side, and in 
the knowledge of death, certain affirmations. 


I have great difficulty in expressing myself here. I will ask you for your 
generosity with your tolerance. I ask it the more particularly because I know 
there is at least one amongst you—and probably there are more than one—who 
will find my attitude and desire foreign to his own. 


To this person, who has my respect, affection, gratitude, as I hope he knows, 
I want to say that, though I understand his inability to speak to us here today 
about my husband—and, in a way, I love him for that inability—yet I do regret 
it; and also I do not accept his point of view. 

My regrets are for the fact that his silence deprives us of a criticism, an 
appreciation of my husband—of my husband’s scientific mind and work especially — 
which no one else could give with equal authority, sincerity and eloquence. So 
there is room enough for regret I think....And then also, as I said, I do not accept 
my friend’s point of view, though I can salute it for its dignity. 

His view is, I understand, that reticence, and silence, and solitude best suit the 
great occasions of human experience—those of grief and loss, particularly. J feel— 
I more than feel: I believe—the opposite. I believe in Voltaire’s saying: “Le 
but de l’homme c’est l’action.” Action means words as well as deeds. I believe 
that for whatever other purposes we may also possess life, there is a secret injunction 
upon us—within us—to express things: to do, to make, to show. And it seems to 
me—it is more than feeling: it is a sort of moral urging—that when the great 
emotional experiences come to us, we ought to give them some outward, visible 
sign: Form: form, in accordance with that law that, as I said just now, seems to 
me to impose action upon us during our humanity: form that is beautiful. 

I have felt, then, in the great experience which has just come to me—the 
greatest I shall ever know—that unless I am to be false to my own instincts, and 
a coward to my own truth, I must testify by some outer form, and beauty of 
symbol, to the quality of my husband's spirit, and the sacredness of his memory, at 
the hour of the burial of his body. 


Feeling, and believing this, I realise the disadvantage at which we stand—we 
who are Freethinkers—when, for our great occasions, we need a ceremonial, 
dignified, harmonious, simple. There, all the Churches, who have had time to grow 
old and beautiful, have the advantage of us. Their poets have had time-.to shape 
inarticulate cries and struggling aspirations into pathetic and stately ritual. Their 
artists have had time to bring colour, and line and music to the spaces set aside 
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for those who suffer, and those who conquer themselves. We here today—those 
among us who are Freethinkers—would not give up for these achievements, fine 
though they are, for our own best: the very essence that makes us Freethinkers. 
Nevertheless, the Churches, in this way, have the advantage of us. And when a 
person like myself wants, as I do today, to mark by outer form and beautiful symbol, 
the great spiritual experience that has come to me, there is no ceremonial—the 
legacy of the genius of ages—waiting for me : and I am ata loss. So disconcerting 
might this position have been, that it would have been easy to yield to the 
temptation that has for the last week assailed me: the temptation to do nothing; 
to give way to difficulty; to accept despair.. All the time that I have felt it 
urgent within me to do honour, somehow, to my husband, on the day of this 
buria!—all that time I have also felt an unworthy fear of the effort: and I have 
very often nearly decided to make none, but to have the ashes of his body buried 
without a sign, and myself alone as witness. 


I am glad I have cheated neither the memory of my husband, nor my own 
instincts, by doing that. I am glad that, by your presence, by the mysterious 
sense of the unity there is upon us in these moments—by the singing of these boys, 
whose music he loved, by these flowers, by the good fortune of an exquisite day of 
sunshine and warmth—I am glad that outward forms acknowledge the inward 
grace: I am glad that the influence of a lovely spirit is abroad in the air above 
this grave, 


In speaking of my husband himself, I shall have to choose one quality only of 
him, I suppose, if I am to be clear. You will all know of others. And I shall not 
speak at all of his special intellectual gifts....1 think, perhaps, his rarest and 
most endearing quality was his particular kind of humaneness. I say “his particular 
kind of humaneness” because it was not in the least like what is called “ humani- 
tarianism.” He had no sentimentality. And he was never in the least taken in 
by humbug. But his humaneness enabled him to know, and to like, the humanity 
even behind the humbug. There was in him at once a complete lack of prudery 
and a perfect personal rectitude. Charlie was as incapable of being shocking him- 
self as he was of being shocked at another person’s shockingness....The fact is that, 
apart from cruelty, he did not take what is called “evil” very literally. He thought 
that nearly all people were intensely likeable when you got to know them. So 
that his charity ich everyone speaks who knows him—vwas far less forgiveness 
than it was sympathy ; and his kindness was always loving-kindness. 





If you will let me, I will tell you one or two things about him that may, 
perhaps, trace that silvery outline of which I spoke....I think of a certain day, 
some years ago, when we had a really wonderful walk together. It was one of those 
fortunate days—those. gift-days—when everything turns out successfully ; when the 
unexpected leaps up; when there is adventure through it all. I won’t give you the 
whole history of the walk, but only these points—to show you Charlie. 

We had just come down Villiers Street from the Strand, and were near the 
Embankment Gardens, when he pulled up suddenly with a look of intense 
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alarm. He told me that one of his old convicts, discharged from Parkhurst, had 
taken to newspaper-selling outside the Embankment Station. “ He talks for hours,” 
said Charlie desperately: “He has the eyes of a lynx. He spots me amongst 
thousands. He'll spot me in a minute. He always does. And that'll mean 
interminable conversation, and half-a-crown. Let’s get into the Gardens while 
there’s still a chance.” So we dived for the Gardens, and were just through the 
gate, when he again pulled up. “ After all,” he said, “I have managed to give him 
the slip three times lately. It seems rather unfair to cheat the poor old boy again, 
so soon. Let’s go by the station.” So we went by the station; and he was 
duly pounced upon, and a lengthy, amicable gossip ensued; and the half-crown 
passed from one pocket to the other.... Now that seems to me very like Charlie: 
that pang of conscience, that sense of fellowship which made him feel that, by 
evading him too often, he was, what he called, “cheating the old boy.” 


After this, we got out on to the Embankment. It was a wonderful day, I re- 
member, in early autumn. The river. was stiffly rippled; the plane trees were 
brilliant in colour and movement; rapid clouds were passing in the blue sky; 
bright traffic was flashing and humming by in the broad roadway.: it was a delight 
+o swing along the pavement, in the keen air, arm in arm—and quarrelling all the 
time, as we mostly did! Presently we reached the Temple, and passed upwards 
through the narrow passages and dark archways, and across the smiling silence of 
the Courtyards: and then out again, into Fleet Street; and up through Chancery 
Lane to Holborn; and so to the left, towards Oxford Street. And when we were 
in New Oxford Street, I suddenly became aware of an astounding apparition on 
the opposite side of the road. Charlie observed people very acutely when he was 
in close contact with them, but he didn’t notice things in crowds. He didn’t 
notice this man; and he continued to arguefy at my side, while I continued to 
amaze myself at the man. 


This person seemed to take up the whole street. It was not so much that he 
was so large, as that he was so blatant. His clothes were the most astounding 
things in vulgarity and newness that could be conceived. He wore a buttonhole 
that was an insult. And the way his boots shone, and his hat shone, and his 
walking-stick shone as he twirled it—the way he simply glared and revolved in 
glory, as it were, with Oxford Street as a mere margin for him—simply took one’s 
breath away. 


I hadn’t time to pull my husband’s arm, and stop his Infinites and Indefinites, 
before the whole bulk of this being was descending upon us across the road, and 
clasping Charlie with a fervent hand. I left my poor man stuttering in his grasp ; 
and went and looked into the Cameo Shop window. It was perfectly clear that he 
hadn’t the remotest idea who the man was, though he was pretending he knew him! 
And presently the volubilities broke down, and I heard this: “I don’t believe 
you know me, Doctor? I am....... ” T didn’t catch the rest; but Charlie’s voice 
cleared up in relief: “Oh, of course; of course!” and then proceeded to rapid and 
friendliest conversation. 
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At last, with some terrific laughter, and a perfect blaze of complacency, I heard 
the man exclaim: “ Well, its all A.1—A.1, that’s what it is. A bit of All Right. 
Everything’s going swimmingly. We're off to the Rhine for a few weeks’ holiday. 
Hope to see you again, Doctor. So long!”—and he was away, with a flourish of his 
hat, down the street towards Holborn; while my husband came up to me, convulsed 
with merriment, saying: “ Will you believe it? That’s another of them!” He was, 
in fact, another convict. Also from Parkhurst. A very bad case of fraud, I believe. 
Quite unpardonable. Still, there he was: free again: at large: enjoying every 
moment of his regained existence. And, as we watched him disappear towards 
Holborn in his outrageous radiancy, the spectacle didn’t merely amuse Charlie or 
stagger him,—it didn’t shock him, and it didn’t sentimentalise him: but it made 
him rejoice at the thing in the man that could so rejoice in liberty; that could 
swagger so in the sun; that could be so little of a snob, and so free from the Past, 
that it could actually come bounding, in ‘camaraderie,’ to an official of the Prison 
in which its convict days had been spent! That was all right in him, whatever 
else might be wrong: and it caught Charlie's affection: he liked the man. 


I hope this also may give you a touch of your friend. It is so difficult, with 
heavy words, to convey an intangible thing. But I hope your own knowledge of 
him may give you the feeling of his quality in all this. 


There is one thing more I want to tell you about him. It was last March, and 
we were in Manchester. We were having rather a rough time of it. We had no 
servant, and most of the rooms of the house were shut up, to reduce work and fires. 
The kitchen was our children’s playroom, and it was our dining-room as well. We 
had breakfast there, one morning, and were, as usual, distinctly late! and my 
husband had to hurry off immediately after into the Prison. It was a bitter 
morning : there was a perfect blizzard of snow and rain. Five minutes after he had 
gone, I heard his latch-key in the door, and he rushed back to the kitchen in 
a tremor of excitement and pity. He said he had found a woman in the street who 
was so ill she could hardly move. She was coughing herself to pieces; and he 
thought she had consumption, or some virulent form of influenza. He had brought 
her back with him. She was in the hall. 


And here I have a confession to make. I must make it, because if I do not, 
I cannot show you what Charlie was. I was angry with him. I was angry because 
I was in deadly terror of the children catching influenza. It seemed to me terrible, 
at the moment, to bring that poor, infected creature into the room where the 
children were—the one room where there was a fire. 


Well, it is the look he gave me when I was angry with him that I want to tell 
you about—a look in which there were not so much reproach and surprise (though 
these were there) as a kind of lovely guilt: a baffled look: a look pleading for 
pardon, and saying in desperation: “ Yes, I know, I know. But, in God’s name, 
what was I to do?” All this was in that look, which was the very essence of 
Charlie: and, without a word between us, I bundled the children upstairs, and we 
fetched the poor thing from the hall into the kitchen. 
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I can see him now, settlirig her by the fire, bringing her a footstool, taking her 
poor, dripping shawl off her shoulders, and hanging it up to dry. We thought for 
a time she was going to die; but she got better in a little while, and sat, un- 
complainingly, coughing; and, when she was not coughing, smiling at the fire. 
He had to tear off to the Prison, as soon as he could leave her, through the 
frightful storm, promising to bring help for her as soon as he could leave his work. 


He returned later in the morning, with an ambulance, and an order for a 
hospital: and again now, I can see him leading her carefully through the hall, 
lifting her into the carriage, nodding at her affectionately through the doorway, as 
the carriage drove off ; and then coming back to me for a moment, before he returned 
to his work, with that same muteness, that same look of an angel’s apology in 
his eyes.... 


This was Charlie absolutely—this passion of pity for suffering. In his last two 
days on earth, during the height of his delirium, one memory recurred, and haunted 
him over and over again: the memory of two little children whose case had been 
tried at the Assizes, and whose bodies he had had to examine, and had found 
marked and mutilated by the fiendish cruelty of their parents. These children he 
could not forget: he mourned and lamented them, seeing them before him in his 
fever, and calling, and calling upon us to take them, and save them.... 


If I had not told you this, I could not have shown you all that I meant by my 
husband’s humaneness: but I do not want the last impression that I, at any rate, 
leave with you, to be one of sadness. I want it to be one of happiness : because he 
was really an extraordinarily happy man. He was happy chiefly because of his 
nature and character, of course ; but, also, he was fortunate. He had got the things 
he most wanted in life. He never had any worldly ambitions at all. He had always 
wanted three things: first, freedom to live a life of the intellect—of observation, 
and of criticism ; and this he was able very largely to do, in spite of the fact that 
he also had to earn our living. And, secondly, he wanted Friendship: and he had 
Friendship. And, thirdly, he wanted Romantic Love: and. he had Romantic Love. 
The things he wanted and hoped for when he was young, he found, and still: wanted 
when he was middle-aged. And when he died at forty-nine, he took with him 
enthusiasms as eager as they were when he was twenty-five. 


I will say no more except to read you the inscription that I shall be putting over 
the place where his ashes will lie. 


For a great many years, I have had in my mind a line of words whose music and 
meaning I very much liked. I only vaguely knew where it came from. It corre- 
sponded to the Christian triad: “Faith, Hope, and Charity”; and it ran thus: 
“Love, Pity, and Equanimity.” 


During the last few days, when I was wanting to find something beautiful, and 
expressive of him, to put in words above my husband’s grave, I thought of this line 
again: and I have found out that it comes from a Buddhist Sutta. I am not very 
clear what a “Sutta” is? but I think it means a “Gospel.” This particular Sutta, 
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from which I have got my line, describes the being who has attained the perfect 
life—that is to say, the life of self-conquest. The passages from which I have made 
my extracts are these: 


(1) “And he lets his mind pervade one quarter of the world with thoughts of 
Love, and so the second, and so the third, and so the fourth. And thus the whole 
wide world, above, below, around, and everywhere, does he continue to pervade 
with heart of Love, far-reaching, grown great, and beyond measure.” 


(2) “Just as a mighty trumpeter makes himself heard—and that without 
difficulty—in all the four directions ; even so, of all things that have shape or life, 
there is not one that he passes by or leaves aside, but regards them all with mind 
set free, and deep-felt love, pity and equanimity.” 


The inscription as I shall put it above the grave will be this: 


“ Here lie, in Sacredness and Honour, 
the Ashes of the Body 
of 
CHARLES BucKMAN GoRING, 

Doctor of Medicine, Bachelor of Science, 

Fellow of University College, London, 
and 

Medical Officer in Chief of Strangeways Prison, Manchester. 
Born Jariuary 31st, 1870; Died May 5th, 1919.” 


And underneath I shall put this: 


acute Of all things that have shape or life there is not one that he 
passes by or leaves aside, but regards them all with mind set free 
and deep-felt love, pity, and equanimity.” 


KATIE MACDONALD GORING. 











ON THE NEST AND EGGS OF THE COMMON TERN 
(S. FLUVIATILIS). A COOPERATIVE STUDY. 


W. ROWAN, E. WOLFF, anp THE LaTE P. L. SULMAN, Fieldworkers. 
K. PEARSON, Reporter. 


E. ISAACS, E. M. ELDERTON, anp M. TILDESLEY, 
Tabulators and Computers. 


(1) Origin of the Material and Method of Measurement. 


This paper may be looked upon as a continuation of that published in Biometrika, 
Vol. x. pp. 144—168. It is based upon a census of the eggs made July 3rd—20ih, 
1914, and contained in Rowan’s Fifth MS. Report on the Faunistics of Blakeney Point, 
the Field Station under Professor F. W. Oliver's direction on the Norfolk coast. The 
year was a record year for the common tern, a marked contrast to 1913, the young 
were abundant as well as the eggs, and many of the birds were still laying. Some 
peculiar nests were found: (a) one entirely of seaweed, (b) another of large wood 
shavings, (c) one of selected small pebbles, (d) a very large nest— the largest yet met 
with. Some of the nests are illustrated in Plate II and will suffice to indicate the con- 
siderable differences between their make up and environment*. The range of ground 
colour with extent and distribution of mottling are indicated in Plate III, which 
should be taken in conjunction with Plate VIII of the earlier paper. There is 
every reason to believe that the two clutches, each of three eggs, were in both cases 
due to a single bird ; the seventh egg, from a one-egg clutch, represents a peculiar egg 
found in the examination of this year’s material. In all 515 clutches were recorded 
as against 203 in 1913. In that year there were 13 clutches with 3 eggs each ; in 
1914 there were 198, and many of those with one or two eggs at the time had also cne 
or two newly hatched chicks, bringing the total up to three. Even the nests with 
one egg (122 as compared with 119 in 1913) were actually nests with the first egg 
only of the clutch, for the birds were still laying, while most of the one-egg 
clutches in 1913 were cither deserted or the egg addled. 


Plate IV gives some further photographs taken of the Ternery. Fig. « is an 
attempt to catch the bird alighting in order to indicate the great length of the 


* The following illustrates a method of nest building, that of nest (d) above. ‘*A common tern laid 
close to the observation tent. At first there was no material whatever. But on the same day a few of 
the Psamma leaves from the tent were taken and deposited round the egg. The next day another egg 
was laid and more siuff was added. None of the Psamma had then been broken and the leaves radiated 
from the centre in all directions. On the second day the first few were broken and tucked neatly in 
all round. Then a third egg was deposited. More pieces of Psamma were added and the nest then had 
a very ragged appearance. It took two more days before the nest was completed and tidied up.” 
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wings. Fig. b is the only four-egg clutch observed. Plates Y and VI show birds 
sitting, the camera being about 18 inches from the bird. 


The characters observed were identical with those of 1913, namely : 


1. Length (L); 2. Breadth (B); 3. Longitudinal Girth (G,); 4. Transverse 
Girth (G,); 5.” Tone or Ground Colour; 6. Mottling; 7. Type of Nest. 


The tone or ground colour was in 1914, however, divided into browns and 
greens. The scale of browns was that of the Colour Value Scale of Plate VIII of 
the first paper, and the green values were judged on a similar scale divided into 
corresponding classes a, b, ¢. d, e, f, g, h, i,k. These classes are distinguished by 
the subscripts 1 and 2 for brown and green values respectively. ‘Iwo eggs only 
had to be excluded from these colour value observations; one was blue and the 
other slatey gray in ground colour. These reduced the total number of eggs avail- 
able for colour value reduction from 1110 to 1108. The classification of mottling 
follows Plate IX of the earlier paper. ‘Types of Nest were divided into three 
categories, t, = no hole in the ground and no materials, t, =a hole but no materials, 
t, = both hole and materials. As only one nest (with a three-egg clutch) occurred in 
type ¢,, we have grouped ¢, with ¢,, so that the distinction is really of unelaborated 
and elaborated nests. 


Of the characters dealt with, the transverse girth (G,) was really taken as a 
check on the general accuracy of measurements. We should have 
a = Mean Transverse Girth/Mean Breadth 
or rather w is equal to this ratio multiplied by the factor (1 — 74,2 ¥¢,U_ + Us") 
where rg,» is the correlation of the transverse girth with the breadth and X%, and 
v, equal ;4, of the coefficients of variation of transverse girth and breadth re- 
spectively. This factor was ‘99990 in the previous set of observations and is 
1:00006 now. Hence its influence on 7 = G/B is insensible for our purposes. 
We find 7 = 32071 against 3°2237 of the earlier series. Thus although the value 
of aw is bettered, we still find the transverse girth is somewhat exaggerated, i.e. 
a is about 2°/, in error when thus deduced. It might at first sight suggest itself 
that the transverse section of the egg may not be truly circular. Suppose it an 
ellipse of eccentricity e. Then if we agree that it is equally likely that the breadth 
of the egg may be measured in any meridian we find 
Transverse Girth 
Mean Breadth 
if e be small. If; however, we put in the values found, i.e. G,/B = 32071, we have 
e4 = °3320, 
leading to b=°6510a for the relation between the semi-axes of the ellipse—a 
quite impossible value. It may be suggested that our chance of taking every 


breadth is not equal and that we are most likely to take the minimum breadth. 
In this case we should have 


om a (1 =F 7s), 


G,/B =m (1 + je), 
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and with our numbers e* = ‘0832, leading to b = ‘9576a—an improbable but not so 
impossible a relation as the former. It could hardly, however, escape observation, as 
even slightly distorted eggs are easily recognised. It seems, therefore, probable that 
the exaggeration of the girth in the transverse sense is due to the difficulty of 
adjusting the tape to the true maximum transverse section—the temptation 
being to bring the reading edge of tie tape into contact with itself with the scale 
facing outwards. If we suppose the celluloid scales to be 0°5 mm. thick this 
would account for the deviation. Probably the longitudinal girth is exaggerated 
in like manner. 

Unfortunately it uia not apparently seem possible for the fieldworkers to adopt 
a more elaborate system of classification for the mottling than was used in 1913 
and accordingly no further light is obtainable with regard to the difficulties 
suggested on p. 146 of the first paper. 


The question of possible pressure on the surface of the egg as it passes through 
the oviduct influencing the amount of pigment deposited was again investigated 
by considering the broader egg in each pair from the same clutch (see l.c. p. 146). 


The broader egg in every possible clutch pair has: 


Greater mottling in 189 cases More dense ground cvlour in 223 cases 
The same _,, — ae The same a es. ie 
Less ‘s Aye.” sees Less dense o | 


Thus our 735 pairs confirm the previous result (on about 100 pairs) as far as the 
mottling is concerned, but not the density of ground colour. There is no dis- 
tinction in ground colour on the average between eggs of different breadths from 
the same hen, but the broader egg does appear to have less marked mottling. 
We shall consider latur whether this result for eggs of the same clutch holds for 
the general population. 


(2) Change of Type of Egg with Season. 








We have: 
TABLE I. 
| Mean | 
Character od 
Season 1913 | Season 1914 | 
| 
ap ae ae | ei a me gi j an 
| LengthZ ... ...  .. | 4144-007 4:21+-004 | 
| Breadth B... ...  ... | 2984004 | 3014-002 | 
| Longitudinal Girth G ... | 11:°39+4°015 11°56 + 007 
| Transverse Girth G, oa 9°59 +°014 9°66 + 006 
| Index 100B/L ...  ... | 7204+:136 | 71°75+-070 
| Index of Ovality O as 56°35 +171 5581+ 088 | 





It is clear from this table that the eggs of 1914 were significantly larger than 
those of 1913. As the fieldworkers remarked before the eggs were tabled and 
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reduced, 1914 was a splendid contrast to 1913; never were so many birds seen 
and the young were as abundant as the eggs. At first sight it seemed strange to 
find such a flourishing colony after the comparative failure of the previous year, 
but in the summer of 1914 the channel was phosphorescent at night with Plankton, 
and probably as a result of this the channel was also swarming with myriads of 
“ Whitebait,” which in their turn attracted the Terns. The suggestion is thus 
thrown out that a plentiful food supply increases the size of the eggs. It must, 
however, be borne in mind that possibly only the stronger and bigger birds survived 
the previous bad season. There may have been fewer very young or very old birds 
and thus the eggs larger. 


We may now consider the variabilities of the two years. 























TABLE ILI. 
Standard Deviation Coefficient of Variation 
Character | 
1913 1914 1913 1914 | 
| > | 
| Length Z om aa "180+ °035 "185 + ‘003 4:34+°12 4394-006 | 
Breadth B ees she 099 + ‘010 099 + ‘001 3°33 + 09 3°28 + 005 | 
Longitudinal Girth G, ... 376 + ‘010 350 + ‘005 3°30 + 09 3°03+°005 | 
Transverse Girth G, | *347 + 010 300 + 004 3°62 +°10 310+ 005 
Index 100 B/L ... ... | 38'449+ 096 3°479 + ‘050 4°79+°13]* § [484+ -069 
Index of Ovality 0 — 4°334+°121 4°326 + ‘062 7°69 + 22 7°75+°111 
ee meee | 








The table indicates that the material for 1914 is slightly less variable than 
that of 1913 taken as a whole. This is possibly due as we have suggested to the 
bad season of 1913 reducing the number of very young or very old birds and so the 
small eggs in 1914. But most of the differences are insignificant except those in the 
two girths. We anticipate that a good deal of interest from the evolutionary stand- 
point might be reached by secular observations on the eggs of this tern colony, 
taken in conjunction with records of the food supply and climate both in the 
nesting season and after. It would be of interest also to mark certain birds and 
record if possible their return. 


(3) Associations of Nest and Egg Pattern. 


It is of great interest to discover whether there is any protective action in the 
colouring and mottling of the egg. In an egg which varies in itself so largely as 
the tern’s this question must be considered not so much in regard to the general 
nesting habits of the species, but in regard to the nest and environment of each 
individual bird. The occasional and possibly habitual practice (see our ftn. 
p. 308) of laying and nest building simultaneously may indeed suggest that the 
birds adapt the immediate environment and material of the nest to the actual 


* See remarks, footnote +, p. 147 of previous paper. 
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character of their eggs. If the egg in shape, colour-value and mottling be related 
to the individual nest, it is hardly conceivable that a hen, especially when a young 
bird, can @ priori appreciate what the type of her egg is likely to be and prepare 
the corresponding protective nest accordingly. Such an instinct would be con- 
ceivable in the case of a species with more uniform eggs and ‘building a specific 
type of nest; it is hard to.conceive it possible in the case of such a wide colouring 
and mottling range as we find in the common tern. The alternative is to suppose 
a considerable variety of tern gentes, who like the suggested cuckoo gentes select 
a particular environment for their eggs. Such a suggestion is not without 
difficulty; it involves mating within the gens, or a transmission of the egg colour- 
ing mechanism through the female only. To accept the latter is not consonant 
with our experience that sexual characters of the female are transmitted through the 
male, i.e. the fertility of the mare and the character of a cow’s milk are correlated 
with the like characteristics in their paternal grandmothers. It is conceivable that 
the pigmentation may vary to some extent with the immediate food supply. . In 
this case green and brown eggs of the same shape and size within the same 
clutch might be more readily accounted for than by the hypothesis of two hens of 
different gentes using the same nest*. It might also admit of the hen having 
some inkling of the character of her forthcoming eggs, if the nest be made before- 
hand. Besides this it would free us from any hypothesis as to tern gentes. 
Thus far we have written as if the protective colouring of eggs was a demon- 
stratea phenomenon. It is highly probable in the case of many species building 
specific nests in specific environments. Can it be asserted of the common tern ? 
If not, elaborate and most varied colouring and mottling would appear to be 
physiological, and originate before they attain prutective character. In other words 
egg patterns have been specially selected for protective purposes, but did not 
originate in the survival of the better protected. 


It will be remembered that we have divided our nests into the unelaborated 
nests, i.e. nests with no material, and with no hole, or merely a hole in the ground, 
and elaborated nests or nests formed by a hole and with accumulated material. 
We shall denote these by S and C, i.e. simple and complex+. We will consider 
first absolute size as measured by the longitudinal girth, G. 


The following table gives the data. The mean of the S-nest eggs is 11°556 as 
against 11°373 for the total population. The correlation found by the biserial r 
met..od was 

‘= + 0685 + 0322. 


* Clutch a, figured in Plate III, shows three eggs practically identical in shape and size yet of very 
different ground colour. Since the size is quite abnormal—being the smallest found in 1914—one can 
hardly believe that three birds laid three such eggs in one and the same nest! Again in the Psamma 
nest referred to in the ftn. p. 308, the three eggs were laid on three successive days; two eggs were 
alike in colour, but the third completely different. 

+ Actually of course every degree of elaboration can occur with a hole and every degree of accumu- 
lation of material. Thus although we have only two categories these cover practically continuous 
grades of elaboration and justify the use of biserial r method of determining the association. 
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This relationship is hardly significant, and if significant only of very small in- 
tensity. It would indicate that the eggs of greater longitudinal girth were on 
the whole deposited in the more elaborated nests. 


To investigate the matter more closely we now correlated the length and 
breadth of the egg with the nature of the nest, obtaining Tables IV and V. 


Here the mean of the egg lengths in the simple nests is 4177 cms. and for the 
total population 4°206 cms. while the correlation is given by 


r=+ 0953 + ‘0321. 


This is probably just significant although only slightly larger than that for 
the longitudinal girth. 


TABLE V. 
Correlation of Nest Type and Breadth of Egg. 
Breadth of Egg. 
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* 2°595— denotes all values from 2°595 to 2°645, i.e. all the recorded values to two decimals from 
2°60 to 2°64. 


Here the mean of the egg breadths for the simple nests is 3:028, while for the 
total population it is 3013. We have 
r= — ‘0952 + 0321, 
or the broader eggs are on the whole in the less-elaborated nests. Thus far then 


the rough nests appear associated with a short broad egg, although the correlations 
are only slight. 


With a view of analysing this point further we now investigate the correlation 
of the index with the type of nest. 


TABLE VI. 
Correlation of Nest Type and Egg Index B/L. 


Values of Index. 
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* 57-95— denotes all values from 57°95 to 59°95, i.e. all recorded values from 58:0 to 59-9, the indices 
being recorded to one decimal place, 
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The mean index for the rougher nest was 72530 and for the general popu- 
lation 71°752. We find for the correlation : 


r = — 1372 + 0319, 


a value greater than in the case of either length or breadth, the less elaborated 
nests having the rounder egg. 


We now took C= B* x L as a rough measure of the volume of the egg and 
found : 


= — 0223 + 0322, 


or r is sensibly zero. In other words there is no relation of volume of egg to the 
type of the nest. Since we might suppose the younger bird to lay smaller eggs, 
or at any rate less broad eggs, the solution of the simple nests being due to 
young birds finds no confirmation in our analysis; it is the shape of the egg 
rather than its size which is associated with its environment. In order to test 
this further the lower portion of the axis Z—4B and the Second Index of 
Ovality*, 100 (Z — }B)/B, were correlated with the type of nest. 


They gave respectively : 
r = +1233 + 0319, 
and r= +1492 + 0318. 











In other words the greater the extension below the hemisphere and the greater 
the ovality the more likely the nest to be elaborated. Thus we see that the rotund 
egg is more characteristic of the careless nest. It is conceivable that the rounder 
the egg the less likely it is to catch the eye when laid amid small pebbles and 
shingle. We next turn to investigate the association of colour and mottling with 
type of nest. First we inquire as to the simple relation of green and brown to 
the nest. Here we cannot go further than a fourfold table: 


* The relative advantage of O.=100 (I - 4B)/B and 0,=100B/(L - 4B) consists solely in the ovaloid 
character of the egg increasing as Os» increases, while it decreases as 0, increases, Either may really 
be used indifferently if this be borne in mind. 
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TABLE VII. 
Type of Nest and Colour. 
Type of Nest. 
CAaetis Fy C | Totals | 
dle 
5 | Brown ... 63 376 439 
so | Green... 79 590 669* 
O 
| Totals 142 | 966 | 1108 | 











One ‘slatey grey’ egg and one ‘blue’ egg had to be omitted from this table. 


We find for tetrachoric r 
r= +0745 + 0409. 


This cannot in itself be considered significant. The sign indicates that green 
egg-layers make the more elaborate nests. No stress can, however, be laid on the 
result. 


We now take mottling and type of nest using the arrangement below as the 
best order we could devise of decreasing mottling. 


TABLE VIII. 


Mottling and Type of Nest. 
Categories of Mottling. 






































| Type of Nest| d@ | ¢ | g | a+b | ¢ | & | fF | & | Totals | 
[ ioe 1 39 10 | 58 5 20 3 143 
C can 15 | 198 71 61 430 24 | 135 31 965 

Totals 16 | 237 | 81 | 68 | 488 |.29 | 155 | 34] 1108 | 

i { 





The method adopted was that of ‘biserial ’ with class index correction for the 
mottling categories. We find, the class index correlation being ‘9534, 


Correlation = + ‘1141 + 0325, 


the sign indicating that the finer blotches are associated with the more elaborate 
nests. 


* The preponderance of green eggs over brown in the ternery at Blakeney Point deserves con- 
sideration because it has not always been recognised. H. Seebohm, Lggs of British Birds, London, 
1896, writes that the eggs ‘‘ vary in ground colour from pale greyish-buff to brownish-buff, occasionally 
with a tinge of green” (p. 102). F. 0. Morris, Natural History of the Nests and Eggs of British Birds, 
London, 1892, gives a wider range of colours, ‘‘pale blue, pale yellow, green, brown, white or light dull 
yellowish or stone colour” (Vol. 11. p. 186), which certainly does not emphasise the broad alternative 
categories brown or green, with a fractional percentage of blue or grey. 
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Lastly we turn to the intensity of the ground colour and the type of nest. 
Here we have worked independently brown and green eggs and the results are 
given in Tables IX and X. 


TABLE IX. 


Type of Nest and Density of Colour in Brown Eggs. 
Density of Colour. 





Type of Nest A, B, C; D; £, F,t+G, : A,+1+k, Totals 








eR 15 11 5 12 8 6 6 ‘63 
3 ee 39 65 58 83 43 38 50 376 
Totals 54 76 63 95 51 44 56 439 
































Again we use the ‘biserial 7’. method and correction for class index corre- 
lation (‘9785); we find 
Correlation = + ‘2189 + 0481. 


Thus there is significant, if only still very moderate, correlation, the relation- 
ship being between denser brown ground colour and the simpler nests, i.e. holes in 
the ground. 


TABLE X. 
Type of Nest and Density of Colour in Green Eggs. 
Density of Colour. 











ker: of Nest | 4. | B | G | D | & | F, | @ | Hat In+Ky | Totals 

ees 3-| 5 | 141 10/12] 6 | 15 14 79 
| | 31 | 46 | 57 1123 | 73 | 79 | 54 127 590 
Totals 34 | 51 | 71 |133 | 85 | 85 69 | 141 669 
































Using the same method as before (class index correction ‘9860), but with one 
more category as F, and G, could be separated as their total was more consider- 


able, we have 
Correlation = — ‘2366. + °0407. 


Thus the dark tones of green are on the whole more frequently associated with 
the nests to which material is brought. 


Accordingly in the case of both ground colours, although we cannot definitely 
assert that either brown or green egg-layers are the more elaborate nest builders, 
we can assert that the denser brown and lighter greens are somewhat more usual 
when the nest is a mere hole in the shingle, and that the lighter brown and 
darker green eggs are associated with more elaborately constructed nests. Again 
the larger blotches are in somewhat greater proportion to be associated with 
unelaborated nests and the finer mottling with the elaborate nests. There is no 


VOL. i2—-Ww 
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reason to believe in any appreciable difference in volume of the simple nest and 
the complex nest eggs, but the former differ somewhat in shape from the latter, 
being broader and shorter, i.e. the eggs in mere holes are more rotund and in the 
elaborate nests more ovaloid. 


Although none of these characters appear to be highly correlated with the type 
of nest as determined by the simple alternative categories adopted by the field- 
workers, yet they are of a nature which more or less lend themselves to explanation 
on the basis of a protective colouring. It is not possible to determine whether 
the great variety of colouring and mottling in the common tern’s egg is a vestige 
of an elaborate system once developed for protective purposes, and now falling 
into disuse, or, as a product of physiological causes, it is now being slowly adapted to 
protective purposes. The problem is a very interesting one and further light we 
think might be thrown on it, if a fuller record were in future to be made of the 
immediate colouring of the nest,—the colour of the materials out of which it 
is made, and in the case of holes the colour of the ground, shape and nature and 
colour of the adjacent pebbles or shingle. It would mean much additional labour, 
but considerable information bearing on the points discussed above might arise 
from such data. 


(4) The Problem of the Mixed Colour Clutches. 


We propose in this section to discuss the problem of the mixed colour clutches. 
The following are the data to be analysed : 


TABLE XI. 
Colour Composition of Clutches. 

















oe = Number | Colour Composition of Eggs. | 
| | 
1 138 | 74 B+63 G+1SG | 138 
2 to. } 67- B+" 9BG+92 G + 386 
3 204 | 62 B3 +8 BG +14 BG? +119 G?+1BL | 612 
4 1 | 0 Bt+0 BE +0 BG?+0 BGS+1 G4 4 
Totals 521 | 203 B only, 41 composite, 275 G only, 2 anomalous | 1110 








Be=n brown, G™=m green, SG=slatey-grey, BL=blue eggs*. 


Putting aside the two anomalous eggs, we have 41 clutches out of 519 wherein 
brown and green eggs are mixed. Putting aside the clutch with 4 green eggs we 
see that as a whole there are 


443 brown eggs to 659 green eggs, 


* The blue egg may be accounted for by the oxidisation of a green egg—a phenomenon observed by 
Newton (Art. ‘ Birds’ Eggs,’ Encycl. Brit.) ; the origin of the oxidisation being unrecognised in this case. 
Newton also states that the individuals of some few species of birds do not always lay eggs of the same 
ground colour, but the source indicated by him, i.e. change with age of bird, would not apply to our case. 
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but the proportions vary with the size of the clutch; for we have 


74 brown to 63 green eggs in the clutches of 1, 
153 brown to 203 green eggs in the clutches of 2, 
216 brown to 393 green eggs in the clutches of 3, 


or 100 to 85, 100 to 133, 100'to 184 brown to green eggs respectively. In other 
words the proportion of green to brown eggs increases with the size of the clutch. 
Those readers who will examine Plate VIII in ‘the first memoir* will see how 
distinct the brown and green ground colours are, and will understand how necessary | 
it is to find some explanation for the change in proportions of colour as the clutch 
increasés in size, and for the mixture of colours in the same nest. The field- 
workers appear to be confident that the same bird can lay different coloured eggs, 
basing their statement apparently on diversity of colour appearing in clutches of 
eggs having the same size or shape. 


The hypotheses that suggest themselves are: 


(i) That the common terns consist of two gentes one of which lays brown and 
the other green eggs. The mixture of colour arises from the existence of ‘ cuckoo’ 
terns who lay in other hen’s nests. 


We cannot ascertain the number of brown egg-laying tern ‘cuckoos’ who lay 
in brown egg nests or of green egg-laying tern ‘cuckoos’ who lay in green egg 
nests. But if the 19 BG. arise from cuckoo-terns, we must originally have had 
74 + 63 + 19 single egg nests and in these 156 nests 19 tern ‘cuckoos’ of opposite 
colour laid. The chance therefore of a tern ‘cuckoo’ of opposite egg colour laying 
in the 1 egg nests is 1218. Treating the 2 egg nest in the same manner, we 
have 67 +92 +8+14=181 of them and in 22 we have occurring the egg of the 
tern cuckoo of opposite colour, or the chance is 1215; this number is sub- 
stantially the same as we reached before and the coincidence is remarkable. But 
it collapses when we go a stage further. We have 62 + 119 whole colour clutches 
of 3, we should therefore expect 25 clutches of 4 with composite colours, Le. 
25/(62 + 119 + 25) ="122 nearly. Now only a single 4 clutch nest was found and 
this had all green eggs. With a chance of about 1 in 8 that a cuckoo-tern will 
lay in any nest, it is hard to believe that it missed at least 181 nests. It appears 
that three eggs is the practical limit to the size of the elutch laid by one hen, but 
it seems hard to believe that the cuckoo-tern would avoid all nests which already 
had three eggs, ie. the cuckoo-tern hypothesis seems to involve a considerable 
percentage of composite four egg clutches, which do not appear. This argument 
seems sufficient to render the hypothesis very improbable. 


(ii) There is only one gens of the common tern which can lay both brown 
and green eggs. Since, however, the number of green eggs increaves with the 
size of the clutch, it is not possible to consider the chance of laying a brown, 


* Biometrika, Vol. x. p. 146. 
+ Or that the rightful owner having laid two eggs would refrain from laying the third because the 
‘cuckoo’ tern had already laid it. 
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respectively a green egg as the same in successive layings. The change of pigment 
in successive layings may be a physiological exhaustive process as a change from 
a melanin to a lipochrome. ‘This hypothesis does not assume that any given bird 
may or may not lay a green or brown egg according to a given law of chance, 
but that physiologically there is a tendency with successive laying to alter the 
nature of the pigment in the glands or on the surface of the oviduct. For 
‘example the hen, as the incubation period approaches, may change the quantity 
or character of her food. 


It is probable, however, that the changes will not be the same for small and 
large layers, we shall therefore give generality to the problem by supposing the 
probability of laying a brown egg to vary not only with the number of eggs laid 
but with each egg. 


We have then the following system of notation: p,, p,’, p,”,...=chance of 
laying a brown egg in the Ist, 2nd, 3rd, ... laying of a hen who lays a clutch of 
s eggs. The corresponding chances of laying green eggs will be gq, =1—py, 
Ge =1—p,,qe' =1—-p,,..... Let there be WV, s-clutch common tern hens. 

Then our data are to be provided by the equations: 

Nip; + Nig’ =74+63 (N, = 137), 
Nopa'po” + No (po qo” + G2 Po’) + Noge'qe” = 674+19+92 (N,=178), 
Nyps'ps ps” + N; (ps"ps” qs + pss qs" + ps ps’qs”) 
is Ny; (ps qs" qs” aN Ps'Qs'Fs. +p; "Q39s’) 1 N92 93's” 
= 62+8+414+119 (N,= 203). 

Dividing out by the totals in each case and equating corresponding terms we 
have the following system of equations to solve: 

i EE, i OVO ca ccendengssdovetsroaneyasioncncacsisanncdend’ (i), 

Pe Po” = °376,4045, pogo” + pr'qe’ = '106,7416, go gq.” ='516,8539 ......... (ii), 
Ps Ps Pa = "305.4187, ps"ps"'Gs + ps" Ps' Gs" + Ps Pa Gs” = 039,4089, 

Pss.Qs + Ps'Qs Gs + Ps'"Gs qs = 068,9655, 9393 gs = °586,2069 .. (iii). 

(i) is solved as it stands. But it is clearly impossible to take g,=4q,' for this 
would involve g,” being greater than unity, an impossible value. Similarly 
qs and g;” cannot be equal to’ g,’ and g,” respectively, or we should have q,” >1. 
Thus it is needful that the probability of laying a green egg should increase 


with successive eggs or be a function of the fertility. Assuming this change of 
probability, we may write the first equations of (ii), the third is not independent : 


po ps” ='376,4045, p, (1—p,”) +p.” (1 — p,’) ='106,7416, 
which gives us p,’ + p,” = °859,5506, or p,’, p,” are roots of the quadratic 
p? — ‘859,5506 p, + 376,4045 = 0. 


These roots are imaginary. 
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Turning now to (iii) we find from the first three equations: 
Ps Ps Ps” = 305,41 87, 
Ps Ps” + ps'""ps. + Ps Ps = *955,6650, 
Ps + ps" + pp.” = 1:064,0394. 
These lead to the cubic for p,, 
ps? — 1:064,0394 p,? + °955,6650 p, — 305,4187 = 0. 
One root of this cubic is p, = *449,5251, which gives on dividing the factor 
ps — °449,5251 out 
ps — 614,5143p, + °679,4254 = 0. 
The roots of this quadratic are both imaginary. 
Accordingly neither the records for the nests with two eggs nor those for the 
nests with three eggs are consistent with a single gens the hens of which lay 


brown eggs with a tendency to lay green increasing with greater fertility. This 
hypothesis has therefore to be discarded. 


(iii) As a last hypothesis we will assume that there are two gentes or types 
of females, one of which lays brown eggs (p,) with a small chance of laying green 
(q.= 1 -—p,), and the other of which lays green eggs (p,) with a slight chance of 
laying brown (q,.=1-p,). . Let N,v,, N,(1—v,) be the number of brown and 
green laying hens in the group N, which lays s eggs in the clutch. We suppose 
p, and p, to be independent of the fertility of the hen, until this assumption is 
shown to be inadequate. 

Clutches of 1 egg. 

Nip, + N, (1 — 1) g. = number of brown eggs = Me,’, say, 
N,q, + Ni (1 — »:) pe = number of green eggs = Mye,’, say. 
For our special case : 
vp, + (1 — 4) Go = 540,1460, 
VQ, + (1 — v;) po = *459,8540. 

These equations are not, however, independent and only suffice to determine », 

from ; 

y= (e,’ = 92)/(Pr = qe) eee e eee eereeeeeeeseseseeeeeses (iv), 
or the proportions of brown and green egg layers in clutches of one, when 7 
and gq, have been found. 

Clutches of 2 eggs. 

If the distribution of clutches be J, (¢,” + e:’ + ¢”) 

ep + el = V2) qs am “yf 
VePrQi + (1 = 9) Q2P2 = te”, 
gi +(1—m) pe =6. 
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Only two of these equations are independent and it is convenient to write 
them in the form: 
Vp, +(1—%) G2 =e + a 
vop,* + (1 — v2) Gg? = &," 
“These will give, if p, and g, are known, one equation for fhe determination of 
v, and one equation of condition. 
Clutches of 3 eggs. 
If the distribution of clutches be W (e," + e.” + €;”" + ¢,”) we have: 
vsp,> +(1—»)q2 =e", 
Vspr°qr + (1 — vs) Go"Po = fea”, 
VsPin + (1 — Vs) Gop? = hes”, 
vg +(1—»)pe =e”. 
Only three of these equations are independent and these may be written: 
vsp? + (1 — v3) gq. = €,” ) 
Vp? + (1 — vs) go? =” + Fe.” Daeg eeuaneet av Gebaud (vi). 
Vsp, +(1— v5) go =e)” + Hee” + he,”) 
These suffice to determine, »;, p, and go. 
Uniting the right-hand sides of (vi) f;”, f2”, fi’ respectively we find : 


ee ae ty ly * eerie rer emerge: (vii), 
which suffices to find vy, when p, and q are found, and 


P= (fe fi" a) Mfr” —%)| 
pi=(fe" — fe" f'" —%)) 
which lead to the quadratic for g, 


QA = KPO KOVAL GE flO renin) 

We could therefore solve (ix) and choose the appropriate root for g., find the 
corresponding p, from (viii), determine v, from (vii), v, from the first of (v) and v, 
from (iv). We might then use the second equation of (v) as an equation of 
condition. But clearly this would not be satisfactory as all our quantities are 
subject to considerable sampling errors. The correct method would be to deter- 
mine »,, vp, v3, p, and q, from the siz equations (iv), (v) and (vi) so as to get the 
best values of these variables. But this would be a very laborious process. We 
propose therefore to determine p, and qg, by the method of least squares from the 
three equations 


PB =(f" —fi’ (fA —4)* 
ie I RE videinsistdbnshccress (x), 
m= (fo" — fe") (A'"” — @) | 


* Obtained by writing f,” =e," + 4e,”, fo’ =e," and eliminating v2 between two equations of (v). 
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using to obtain linearity g.= 9. +, where g, is the value given by the quadratic 
(ix) and is supposed a small quantity with negligible square. The values of 
Pp, and q, found from (x) will be good, if not the best. Our system for 1, v2, vs, 
Pr. Ye Will not be the optimum possibile, but if-our system is probable, that will be 
still more probable and the hypothesis of the two gentes of tern hens will not be 
contradicted by the data. 


Our system of e’s is: 


e ='540,1460, ¢,’ =°376,4045, «” ='106,7416, 
” = °305,4187, ,"" = 039,4089, «,” ="068,9655, 


€) 
leading to: 

Ji’ =°429,7753, fi’ =376,4045, 

Ji” ='354,6798, f.” ='318,5550, f,” = ‘305,4187. 


(ix) now becomes : 


192,75739,2 — *192,4337q, + 006,8485 = 0, 


giving the small value G, = 036,9571 for the chance of a green gens hen laying a 
brown egg. 


We now return to (x) substituting the f’s and 036,9571 + » for g.. Expanding 
and neglecting 7° we obtain, on extracting the root of p,* in the third equation 
p, =917,7816 + 1:242,3209 », 
P: = '961,3638 + 1:909,4759 n, 
Pi =°961,3638 + -991,4412 ». 


Solved by least squares these equations give for type equations : 
Pi = 946,8364 + 1:°381,0793 n, 


Pi = 948,2960 + 1:489,7513 n, 
leading to: 

Pi = °928,2876, gq, =°071,7124, 

P2="976,4736, 9g. = "028,5264. 


Whence from (vii), the first of (v) and (iv), 
v, = '366,0119, 1—v,=°633,9881, 
v, = °449,0123, 1—-—v,=°550,9877, 
vy, ='571,0011, 1—v,=*428,9989. 


Thus about 7 °/, of the eggs laid by hens of the brown-laying gens will be 
green, and only about 2°/, of the eggs laid by hens of the green-laying gens will 
be brown. Further the green-laying gens is far more fertile than the brown- 
laying gens, the proportion of brown to green layers falling from 57 to 43 in the 
single clutches to 37 to 68 in the triple clutches. The following is our analysis 
on this basis. 
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Single Egg Nests. 
Observed B74, G63. 
Theoretical B74, G 63. 


Number of brown egg layers 78°23. 
» green S 58°77. 


Number of brown egg layers with brown eggs 72°62. 
” y0 » With green eggs 5°61. 


Number of green egg layers with green eggs 57°39. 


» 0 » With brown eggs 1°38. 
Two Egg Nests. 
B? BG G 
Observed 67 19 92 


Theoretical 68:92 15:15 93:93 
Number of brown egg layers 79°92. 


rn _ gréen i 98°08. 
Number of brown egg layers who lay both eggs brown 68°87. 
” green » » » 0°05. 


Number of brown egg layers who lay one brown, one green 10°64. 
” green »” » ” ” 4°51. 
Number of brown egg layers who lay both eggs green 0°41. 
”» green ” ” oe 93°52. 


Three Egg Nests. 
BS BG BG? Gs 
Observed 62 8 14 119 


Theoretical 59°43 13:98 9°72 119°87 
Number of brown egg layers 74°30. 


» green ~ 128-70. 
Number of brown egg layers who lay 3 brown eggs 59°43. 
” green ” ” » 0:00. 


Number of brown egg layers who lay 2 brown and 1 green 13°77. 
” green ” 0 » 09 0°21. 


Number of brown egg layers who lay 1 brown and 2 green 1°06. 
» green » » » » 8°66. 


Number of brown egg Jayers who lay 3 green eggs 0°03. 
= green . Pe 119°84. 


It will be.-noted that the theory gives for the B°G and BG* about inverted 
proportions. It also falls short in the B@ group. These.would very probably 
have been bettered with a more general solution of our six equations. But are 
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the existing frequencies. inconsistent with our observations and beyond the limits 
of random sampling? Summing up our results we have: 


























| B | a | B | Be a. | BS | BG | BG? | 43 
' | 
| 
Observed 74 | 63 |er |1i9 joe lee | 8 |14 | 110 
Calculated | 74 | 63 | 68-92| 15-15 | 93-93 5010 | 13.98 9°72 | 119°87 
Bish | 











From these we find y* = 5°631, giving P = ‘688, or in 69 trials out of 100 the 
sample would be more discordant from the calculated than the actual observations. 
There is accordingly nothing to be said against the theory on-the ground of its 
statistical imprvubability. 


Again of the two hypotheses involved,-(i) the greater fertility of the green 
egg layers, (ii) the fixed small probability that a hen of one gens will lay occasion- 
ally an egg of the colour of the other gens, the first.seems not unreasonable; the 
second gives merely a quantitative measure of the assumption made by a number 
of ornithologists that birds can lay eggs of two colours. It assumes, however, that 
as a rule they do not. Clearly we need to know more of the mechanism of egg 
coloration before we can settle how it happens that a bird usually staining its 
egg brown will stain it green on a few occasions. If it be a result of type of 
food, we have to assume ‘that our two gentes feed as a rule differently, which is 
not easily to be admitted. Will this feeding habit then be hereditary and if so 
are the male birds also divided into two gentes and is the mating assortative ? 
Granted on the other hand that it is not due to food, but to differences of pigmen- 
tation mechanism, we are compelled to ask whether this mechanism is inherited 
only through the female. If not, then are the matings within the gens, or what 
is the pigmentation mechanism of heterozygote hens? If we could establish the 
existence of the two gentes each with its rule and its fixed exception to rule; if 
further the pigmentation mechanism as one must decidedly expect from the eggs 
of many species is markedly hereditary, then it is possible that in these clutches 
of composite colour lies the solvent of some difficulties which the Mendelian 
explanation meets with when the product of two protogene zygotes instead of 
being protogene is in rare cases found to be allogene, 


(5) The Organic Correlations. 


We devote this section to a consideration of the degree of relationship between 
size, shape and colour characters of the same egg, and their relative values in the 
seasons 1913 and 1914. 


(i) Mottling and Breadth, Length and Index of Egg. 


The value of the correlation of mottling and breadth in the 1913 census was 
‘1803, but unfortunately the sign of it was possibly wrongly given, as may be seen 
from the Table p. 150 of the former paper (Biometrika, Vol. x.). We have taken 
occasion already to refer to the difficulties in the mottling scale used, bué after 
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much consideration we are unable to substantially modify the assumed order of 
mottling of the previous paper. In broad lines we have: 


TABLE XII. 





Mean Breadth 
Mottling 





1913 census (291 eggs) 1914 census-{1108 eggs) 








Confluent blotches, d+e+g 2°97 3°03 
Transition Forms, a+b ... 2°97 2°95 
Discrete, Copious, ¢ es 2°99 3°02 
| Discrete, Sparse, A+f+i 2°96 3°01 














The value of polyserial » corrected for class index correlation of mottling is 
‘1753 for the census of 1914. It is therefore certainly within the probable error 
of the difference. Now in both cases the confluent mottling gives a greater 
breadth than. the discrete and sparse mottling, but the transition forms a +b, 
and c, are anomalous. The correlation ratio 7 in both cases is significant and 
shows a relation, not very intense, between mottling and breadth, but in the 
present stage of the mottling classification it is certainly not possible to unravel 
the relationship. The 1914 returns undoubtedly seem to indicate that not only 
the confluently but the discretely mottled eggs have the greater breadths, the 
lesser breadths being found in the transition forms. It should be noted that the 
returns for 1914 being nearly four times as numerous :are worth twice as much. 


If we could really lay any stress on the sign to be given to the association, we 
should have to assert that in the species at large the rule is opposite to that for 
the individual hen. In her case the broader egg has less mottling, while in the 
species the broader egg has the greater or at least the more confluent mottling. 
The former relation overrides any result to be obtained from the species as a 
whole, and seems to oppose any theory that greater pressure during transition 
through the oviduct is the source of greater mottling*. 


We have further worked out the associationt of Index and Length to the 
Mottling. We have 


Census Census 
1913 1914 
Mottling and Breadth ‘1803 ‘1753 
Mottling and Length — 0937} n, = °0850 + 0203f. 
Mottling and Index "1550 1598) 


Since the probable error is of the order ‘02 we see that the value is insigni- 
ficant for length. On the other hand the order of mottling classes in the three 


* The time of transition through the oviduct may conceivably be a factor of greater importance. 

+ Obtained from polychoric 7 with correction for number of arrays and the class index correction 
for mottling. 
t 79 is the mean value of the correlation ratio on the assumption of n° association. 
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TABLE XIII. 
Mottling and Size and Shape of Egg. 
Index Breadth Length 
ep oie 
Slob 69°52 2-954 4256 
Bié 70°60 2-998 4-255 
| a 70°68 2°947 4170 
et e 71°71 3°025 4228 
. c 71°86 3-016 4-200 
a1 f 71-90 3001 4184 
ay nT 72°12 3°042 4°220 
g | 72°65 3-032 4-184 
ad | 73,08 3°017 4133 





























* Mean value of 7 supposing no association? 


value of 7 for zero association. 


























cases does not appear interpretable. The following table gives the means for 
each class of mottling as specified in Plate VIII of the first memoir. 


The series for index in ascending order corresponds roughly to a series in 
ascending order for breadth and descending order for length, but the system does 
not correspond to any easily appreciated mottling order. 
fieldworkers might have been influenced by shape of egg, instead of merely 
comparing the nature of the mottling in selecting type. At any rate in this 
section no final conclusions can be drawn, and it seems very desirable that more 
elaborate descriptions of mottling should in future be carried out. 


(ii) Ground Colour and Breadth, Length and Index of Egg. 


The following scheme gives our results. The first value of 7 is the uncorrected 7’, 
the second the value when corrected for number of arrays and class index corre- 
lation, which is ‘9785 for brown and *9860 for green eggs. 


It appears as if the 


TABLE XIV. 
| hse aliae ee 
Index Breadth Length | 
. eo ee ae 
Brown | Green | Brown Green Brown Green 
| cael 2 a 
"ie 1747 1733 +1348 "1385 -2061 “1432 
n | ‘1011 ‘1313 | imaginaryt 0773 "1530 ‘0857 | 
" 1432| f -1160'| 1432 | f -1160 "1432 “1160 
"0 + 0322 | 1 +0261) 1+°0822 | (+0261 | (+0322 | | +0261 
| 


It will be fairly obvious from this table that there is no association of ground 


+ This signifies that if »,2 be taken from 7’2 the difference is negative, i.e. y’ is less than the mean 
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colour, whether green or brown, with either size or shape of egg*. This does not 
appear at all unreasonable if we assume the ground colour to be deposited before 
the egg enters the oviduct or the shell becomes finally hardened. 

The general conclusion therefore to be drawn from the present investigation is 
that intensity of ground colour, whether green or brown, has no relation to egg 
size and shape, but that breadth of egg, whether considered directly or through 
the index, is more probably related though not intensely to mottling, but the 
nature of the relationship must be obscure until a more elaborated classification of 
mottling has been adopted. 


(iii) Relation of Mottling to Ground Colour. 

The data are given in Table G* at the end of this memoir, where we have 
separated brown from green eggs, because it is conceivable that the relationships, 
if any, for the two categories might be different. If 0, denote the mean contin- 
gency when there is zero association we have: 


For the Brown Eggs : C, = °2830 + ‘0323. 
Uncorrected Contingency: C, = ‘2030. 
For the Green Eggs: 0, = 2118 + 0261. 
Uncorrected Contingency: C, = ‘2557. 

Thus for the brown eggs there is no significance in C,, it being less than the 
mean value of the contingency, when there is no association. For the green eggs 
O, is greater than C, but the difference is less than twice the probable error of 0,; 
we cannot therefore assert any real relation to exist between mottling and intensity 
of ground colourt. Under the circumstances of the above relation of C, to C,, it 
did not seem necessary to correct C,, as such correction would not alter the con- 
clusion of no significant association. Although the intensity of ground colour 
may have no relation to mottling, it is conceivable that the colour of the egg may 
itself have relation to mottling or indeed to intensity of ground colour, i.e. a brown 
egg may have deeper tones of ground colour and denser mottling than a green egg. 


We have the following biserial tables to illustrate these cases. 


TABLE XV. 
Mottling and Colour of Egg. 
-Mottling Categories. 











Colour of Egg fgtd | e a+b e+h ft z Totals 
Brown ... 36 89 29 215 57 1l 437 
Green ... 61 148 39 300 98 23 669 
Totals... 97 237 68 515 155 34 1106 
































* This statement is not really contradicted by the »=:1506 of p. 148 of the previous memoir, for 
with the small number 291 eggs of that census )=-1655+ ‘0395!, so that 7 is less than the value for 
zero association. 

+ Examined in the same manner the ‘result for 1913 appears not to have the significance we 
attributed to it. We have C,=-2813+ +0395, while the corrected contingency is only Ce=-2260. Thus 
Cz is actually less than the mean value when there is no contingency. 
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The order of mottling categories seems to correspond as closely as we can 
determine from the plate to the order of relative amount of mottling. 


We find for the mean 7 when there is no association : 
7 = 004521. 
Hence 7’ corrected for number of arrays but not for class-index is given by 
1, 004,702 — -004,521 
996,383 
or » = "0135. 


This is insignificant and therefore we need not trouble to find the class-index 
correction. It would not appear therefore that the brown eggs are more densely 
mottled than the green eggs. 





= ‘0001817, 


We now pass to intensity of ground colour. It will be remembered that two 
scales were formed of ‘values’ giving as far as possible equal values by the same 
letters for both green and brown colours. 


TABLE XVI. 
Colour and Value. 


Ground Colour Values. 














| Colour of Egg | A B C D E F+G | H+I+K | Totals 
Brown ... 52 76 63 95 51 44 56 437 | 
Green... 34 51 71 133 85 154 141 669 | 

| Totals... 86 127 134 228 136 198: 197 1106 | 


























It is clear on the face of this table that the percentage of high values in the 
brown series is far greater than in the green series, which has a much greater 
percentage of low-colour values. To get an appreciation of this association we use 
biserial 7. We have for zero association 


# = 005,425, 
while uncorrected 7’? = ‘113,734. 
Accordingly corrected for a number of arrays 


mg _ 118,784 — 005,425 
= 505.479 7 108.8008, 


leading to n’ = °3299. 


Calculating the class-index correlation, we find it ‘9674 and thus finally 
corrected 





n =°3410 + 0197. 


Biometrika xu 
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This is a significant and fairly substantial correlation between colour and 
colour value. 


It would appear as if absence of Sorby’s oozhodeine pigment* also involved 
less copious pigment material in general. 
(iv) Organic Relations in Shape and Size. 


The fundamental tables are Tables G to L at the end of the paper. The 
correlations are as follows: 


TABLE XVII. 


Organic Correlations in two Seasons. 











| . Dei | | Correlation, 1914 | Correlation, 1913 | 
Character Pair | Symbols (c. 1110) (c. 294) 
| ans not eae 5) | 
| | | 
Length and Breadth | Les 2104 + ‘0193 *2220 + °0374 | 
Longitudinal and Equatorial Girths | Gis Gy *5189+°0149 | °*5297+ 0284 | 
Length and Longitudinal Girth ... | Z2,G, | ‘8515+ °0055 *8804 + 0088 
Breadth and Longitudinal G Girth ... | B,G, 48404-0155 | 52164-0286 | 
Index and Length ‘ ame ee — ‘7577 + ‘0086 — °7284+°0185 | 
Index and Breadth . oe, it ee °4537 + ‘0161 *5033 + 0294 
Index and Longitudinal Girth ... | I, G — 4496 +0161 | —*3832 + ‘0336 
| 
| | 











The following table contains the seasonal difference and its probable error. 


TABLE XVIII. 


Seasonal Change in Correlation. 





a T i 








| Character Pair | A=1914—1918 | Probable error.of A | 

| 

| ZL and B - 0116 +0421 | 

| GrandG, | — 0158 | + 0321 

| L and G; — 0289 +°0104 

| Band G, — 0376 + 0325 

| J and ZL — 0293 | +0204 
I and B --0496 | +°0335 | 

| Z and G, — 0664 + 0373 





With the exception of the correlation of Length and Longitudinal Girth none 
of these differences has a significant relation to their probable errors. In the case 
mentioned, however, such a deviation would occur in excess 3 times in 100 trials 
and in defect 3 times in 100 trials, or as we have made 7 trials the odds against it are 
only 52 to 48. We cannot therefore lay much stress on it, and conclude that no 
seasonal change in the organic correlations is to be observed between 1913 and 1914. 
As there were considerable changes in the means (see our p. 310) this result confirms 


* «On the Colouring-matters of the Shells of Birds’ Eggs,” Zoological Society’s Proceedings, 1875, 
p. 359. 
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the general conclusion that except for very skew distributions, change in means 
does not involve change in correlation. Change in variability does usually denote 
change in correlation, but as we have indicated (p. 311) the changes in varia- 
bility are not significant except in the girths, and this may be the source of such 
modification as we find in the correlation of Length and Longitudinal Girth. To 
test this we note that if Longitudinal Girth only be changed the regression 
coefficient of the Length on the Longitudinal Girth ought not to be changed 
within the mits of random sampling. For 1914 this coefficient of regression is 
4501 + 0056 and for 1913 is -4215 + 0089. Hence the difference is (0286 + ‘0105. 
Thus the difference in the regression coefficients is just as significant as it was in 
the correlation coefficients, or is not explicable on the basis of increased varia- 
bility in the Longitudinal Girth. If it is, which we doubt, to be considered 
significant it must depend on something else. than a more variable Longitudinal 


Girth. 


We may consider in this place what changes have taken place in the formula 
connecting Longitudinal Girth with Length and Breadth. For 1914 we have: 


G, = 1:1278 B + 1°4840 Z + 1:9180, 
while for 1913 we had: 
G, = 1:2701 B+ 1°6415 L + 8224, 


The changes in the coefficients look more considerable than the changes that 
will be found in the values for G; calculated from either formula for eggs which 
are not extreme variants. At the same time the differences rather tend to 
emphasise the suggestion given by the correlation of G, with L, that there may 
have been a seasonal change in the organic relationship between these characters. 


(6) The Homotypic Correlations. 


The results for the 1914 season are of a very startling character, ;- they demon- 
strate that while the organic correlations remain nearly constant the homotypic 
correlations can suffer a very considerable seasonable modification. In other 
words the birds laid eggs very much more alike in 1914 than in 1913. The 
reader will remember that 1913 was a bad season for the birds, many young 
perished and there were few nests, On the other hand 1914 was a good season ; 
there was plenty of food, numerous and possibly stronger birds. The éggs in the 
clutches were more alike in 1914 than in 1913. 


We proceeded to investigate in the first place whether the greater intensity of 
homotyposis was due to there being a far larger proportion of three-egg clutches. 
Accordingly we took only the 1st and 2nd eggs in the clutches and obtained the 
homotypic correlation for Equatorial Girth. It was °7535, for 383 pairs of eggs. 
When we took all possible pairs out of all the clutches we had 796 pairs, and the 
correlation instead of rising, fell, but insignificantly to ‘7469. The difference 
between 1913 and 1914 cannot therefore be due to a far larger number of clutches 
providing three pairs in the latter than in the former year. 
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TABLE XIX. 
Direct Homotyposis in Size and Shape. 








Characters | Symbols Season 1913 Season 1914 
Lengths of Eggs LL | +4643 + “0346 “6056 + ‘0107 
Breadths of ad B, B | +5176 + -0326 ‘7327 + 0078 
Longitudinal Girths... | Gi, G; | 5076+ -0327 "6689 + 0093 
Equatorial Girths | Gy, @ 4621 + -0350 "7469 + 0075 





Index 100 B/L, 100 B/L| +5537 + ‘0308 ‘5327 + 0120 




















It must at once be admitted that this result is of a very startling character. 
Only the homotyposis of the Index has remained without any significant change, 
ie. the degree of likeness in shape does not exhibit a seasonal change; in all four 
cases uf absolute size there are most substantial and of course significant changes 
in the homotyposis. The mean size homotyposis has risen from *4879 to °6885, 
ie. by about 40°/,! It is difficult to offer a demonstrable explanation of this 
great change. The factor we are seeking for must be one which modifies so to 
speak the individuality of the bird between its successive egg layings. For 
example, a change in the climatic condition or in the food supply occurring in 
1913 somewhere during the egg-laying period. Such a factor, however, would 
lead us to suppose that the high values of 1914 were the normal homotypic 
values, whereas they appear to us from the comparative standpoint to be the 
abnormal. If we suppose only the stronger birds survived to the season 1914 and 
that there was a plentiful food supply, it would seem that the community as a 
whole should have exhibited less individuality in ‘size and not more,—the weaker 
birds obtaining less food supply would not appear. There i3, however, so little 
change of type*and variability of the eggs in the two seasons that it is hard 
to believe that selection of the birds is the source of the change. Further if 
anything the variability of the eggs is less in 1914 than 1913, and such reduction 
of variability would tend to reduce rather than increase correlation. If we suggest 
that 1913 killed off many of the old birds and that there was a larger proportion 
of young birds in 1914, so that there was a more heterogeneous community 
in 1914, we are pulled up by the fact that the eggs were on the average very 
slightly larger in 1914, which is, perhaps, not what we should anticipate with a 
larger proportion of first layers. It would seem as if we had to take refuge in 
some very vague statement that the seasonal environment for 1914 interfered 
less with individuality than that of 1913. But this does not really help us and 
leaves us with the greater difficulty, that it suggests that ‘individuality’ is an 
indefinite quantity from the statistical side and might result under favourable 
environmental conditions in all the eggs of a clutch being perfectly alike! The 
persistency in the Index value seems in itself to point to a limitation in in- 
dividuality, and it seems wisest at present to await further material before 
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speculating on the source of this marked seasonal change in size homotyposis. 
‘One point, however, we can investigate, namely, whether pigmentation homoty- 
posis has or has not kept pace with size homotyposis. 

With this aim ‘in view the direct homotyposis has been worked out between 
mottling of one egg and mottling of a second in the same clutch, and between 
ground colour of ofe egg and ground colour of a second in the same clutch. 
Further the cross-homotyposis has been determined between the mottling of one 
egg and the ground colour of a second in the clutch. The fundamental difficulty 
‘here lies in the treatment of the ‘values’ of the ground colour. We cannot 
separate green eggs from brown, because of the occasional appearance of mixed 
colour clutches. Nor would it be reasonable to work with contingency on a 
20 x 20 category table. We have accordingly been compelled -to pool green. ard 
brown eggs, when they have the same ‘value’ on our colour scale. This at any 
rate renders our present results comparable with those of 1913. But until we 
know more of the mechanism of egg pigmentation it is impossible to assert that 
equal ‘values’ in brown and green ground colours are what we should anticipate 
as a result of individuality working occasionally with one and occasionally with 
another pigment. The homotyposis pigmentation tables are given as Tables 
R, S, and T at the end of this paper. In actually determining the contingency we 
have clubbed d and e in the mottling together, and A, and A,, B, and B,, 
C; and C,, etc. in the value of the ground colour, thus reaching 8 x 8, 10 x 10 and 
10 x 8 contingency tables. These have then been corrected for number of cells 
and for class-index correction. The class-index correction for mottling is 9531, 
and for value of ground colour ‘9848. 


We consider first. the cross-homotyposis of ground colour and mottling. The 
coefficient of mean square contingency on the supposition that there is no asso- 
ciation between value of ground colour in one egg and mottling in a second 


would be 


C, = "1992 + 0169. 
The corrected actual coefficient of mean square contingency is 
C, = ‘1479, 
which is less than the mean square contingency coefficient for no association. 
Accordingly there is no cross-homotyposis between mottling and ground cclour, 
and there should not be if our view be correct that the organic relationship in 
the same egg is zero (see p. 328). 
The value found for the 1913 data was 
C, = 3989 + 0379, 
and was spoken for as significant. But the fact was overlooked that 


CG, = 3169 + 0451, 


so that C, is less than twice the probable error greater than C,, and may well not 
be significant. This conclusion is confirmed by the consideration that the organic 
correlation of mottling and ground colour was really insignificant in 1913, and 


VOL.12 — X 
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thus it is exceedingly improbable that the cross-homotyposis could be significant*. 
Direct homotyposis provides the results of the following table : 


TABLE XX. 
Direct Homotyposis in Mottling and Ground Colour Value. 








| 
Character | Season 1913 | Season 1914 
| Mottling of Eggs in same Clutch ... | 3500 6267 
| Ground Colour Value of Eggs in same ‘Clutch | ‘5709 7480 


The probable errors of the 1913 values are well below ‘045 and of the 1914 
values well below ‘017. Accordingly the differences are markedly significant, or in 
the nature of pigmentation the resemblance of eggs in the same clutch is much 
more intense in 1914 than in 1913. Thus the results for size and shape of egg are 
confirmed by those for pigmentation. We have therefore this very remarkable 
fact—a fact which it seems to us may be of some consequence—namely that the 
season can affect the extent to which the female bird impresses her individuality 
on the external characters of the egg. It does not follow from this that seasonal 
differences can affect in the like marked manner the individuality of the internal 
characters of the egg. But it does raise the suggestion that it would be well 
worth inquiring whether the degree of resemblance of offspring born in one 
season can differ sensibly from the degree of resemblance of those born in another 
season. Should such a difference be established, it would indicate that heredity—in 
other words the nature of the germ plasm—could be more readily influenced by 
seasonal differences than has yet been anticipated. We ourselves should be very 
unwilling to admit this, but we must at the same time confess that we see no 
obvious explanation of these significant changes in homotyposis. If individuality 
impressed in the ovary and in the oviduct on the form and colouring of eggs can be 
increased or decreased by seasorial differences, it is not a very long step to believe 
that other physiological processes of this region which impress individuality on the 
internal characters of the ovum can be modified by the nature of the season. 





We now turn to the cross-homotyposis in size and shape of the tern’s egg: 


TABLE XXI. 


Cross-Homotyposis in Size Characters. 





Characters of the two Eggs | Season 1913 | Season 1914 


| = 
| Length and Breadth. 09224-0441 | +2621 + -0187 
| | 
| | 





| 
Longitudinal and Transverse Girths | 2603 + °0413 
| 


"4546 + 0134 
Length and Longitudinal Girth "4229 + 0362 5854+ 0111 
Breadth and Longitudinal Girth . *253C + °0416 *4162 + °0140 





* See above our second footnote on p. 328, In 1913 we had noi fully realised how high Cy. could 
be for such short samples as a couple of hundred. Hence the source of the error. 
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We are thus again faced with the fact that the cross-homotyposis of the eggs 
of 1914 is substantially higher than that of 1913. We still see the markedly 
emphasised individuality of the female birds. 


We have next to enquire whether, the organic relations being practically 
constant, the cross-homotyposis has increased in proportion or not to the direct 
homotyposis. We can test this by Pearson’s suggested relationship*, namely 
Cross-Homotypic Correlation of # and y=} {correlation of « with «# + correlation 
of y with y} x {the organic correlation of « with y}. The following table gives the 
calculated and observed cross-homotypic correlations for the seasons 1913 and 
1914, 


TABLE XXII. 


Cross-Homotypic Correlations as Calculated and Observed. 





| Season 1913 Season 1914 





Gharacter Pair of two Eggs ee 


| 
Calculated | Observed | Calculated Observed 





| 








| 


| 
Length and Breadth... ...  ... |  *1090 











| 0922 “1408 ‘2621 

Longitudinal and Transverse Girth 2568 | °2603 3638 =| °4546 

| Length and Longitudinal Girth ... "4278 "4229 "5426 5854 
| Breadth and Longitudinal Girth ... ‘2674 | *2530 “3392 “4162 


Thus while the calculated values were in excellent accordance with the ob- 
served in 1913, they are very inadequate to express the increased individuality in 
1914. In other words the cross-homotyposis appears increased even at a greater 


rate than the direct homotyposis which we have shown in itself to be markedly 
emphasised. 





What we are accordingly confronted with in the season 1914 is an exuberance 
of individuality and the possibilities which such a variation of individuality 
suggests. It may be confined to the externals of the egg, but the physiological 
factors which determine those externals must at least be in close proximity and 
may, perhaps, be affiliated with others which affect matters much more important. 
The approximate constancy of type, variability and organic correlation for these 
two seasons coupled with the marked change in homotyposis is a problem which 
demands further observations and much hard thinking. 


* Phil. Trans. Vol. 197 A, p. 290. 
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TABLE A. Organic Correlation. Mottling and Breadth. 





















































Mottling*. 
| Breadth | a | bile | da ny 9 A |i | Totals 
2-60—-+ | — re ee eee ee ee 1 
ee ee ie | oe Eee de ‘Een SS 3 
fo 8 | 1 8 Shy eae 1 7 
a gS ae 3a ts “Ss eee 19 
|. 2:80— Sete Shee ee ee ees 1 36 
| 2a—] 2| 7 | 8 | — SB @ a & ae 1 60 
290— | 5 9 | 4. | 3 | 30 | 20 7 4 | 5 198 
295— | 4 5 | 81 2/2 |] 33 | 1/61] 9 180 
300o— | 3 | 10 | 98 4-| 62 | 98 | 18 | 5 | 44 g3e 
3-05— 1 8 | 96 7 | 64 | 30 | 1 7 5 219 
fer sin tei“ |i aia 2| 5 147 
Sen es te | Oe 1 OS 5 9 4 1 51 
3:20— ritmie«atstiLe 3 1 1 21 
<a ee ee ra ne Be ae ee 3 
ee ee VE eee Sah ee oe 1 
Totals | 22 | 46 | 488 | 16 | 237 | 155 | 81 | 29 | 34 | 1108 
* Two eggs with no recorded mottling. + 2-60— contains all breadths from 2-595 to 2°645. 


TABLE B. Organic Correlation. Mottling and Length. 
































Mottling*. 
Length a b | c | d e | Ff g h a Totals 
ee = ' 
3°25—+ ee ee t 
ose F— 1) — | —'— ft — 1}/—j— 1 
3°35— — ee — | —j|— 1 
sy— Ff -—-j; —!/- —j};—};orel—-je— 0 
Sh5—- F —-)' — , — } —}—fomtymti-f|e— 0 
3°50— ee | —j-— 1 
s36— F— | — fF — | —| 1 se ae oe eae a 1 
oe F— eK Lee he lel ete 0 
seo— F — | Fi} —!} 1] 2{/—j;—-j]-— 4 
s7o—- J — | — |) — |, - 1 2 ie Gee oes 4 
375— 1|— i= Te 1 +=) = 6 
sso— | — | — 3/— 1 2)/—)}—|—- 6 
38— | — | — | lo} — 3 2 O'S) aos -1 aie 18 
3°90— Tt = | 14 I 7 6 2 1 1 33 
3°95— 1 1 ot -/ 3 4 3 1 2 33 
4°00— 2 $s} 31 — 17 9 3 3; — 67 
4°05— 1 3 | 47 2} 18 | 12 | 10 1 $i 
4:10— 2 6 | 66 4 |-96 | 24 | 12 3 3 146 
h15— 3 8 | 57 1: | -21 14 4 5 5 118 
4:20— 3 4 | 65 4 | 34 | 2 | 18 3 4 155 
425— | — 2 | 32 2 19 10 7 2 1 75 
430— | 2 4 | 50 1 | 32 | {3 6 4 5 117 
s35— - — 4;/6 i} —} Boj n 3 3 1 59 
440— 1 4 20 — 10 6 i ae | 3 50 
4 45— 2 4j,233 | — | 18 3 ea 3 50 
450— Si—)} ww) — 8 2 2 1 2 29 
s0E— | — 1} 10 {—] 1 3 ij] — 1 17 
460— | — 1 | i=] % i = Pt 11 
see— Ff — |) —7; Fi —-] 87 —-]}—-}]—-]=- 3 
| 4:70— —}—}—}—j}ye—frye—yeH ye 0 
475— | — | — | a 1}; —|—]— 2 
480— | — |} —}- — | —]} type fie—}e—p— 1 
4°85. i. ome 1 | —_— —_—- | — — 1 —_ — 2 
Totals | 22 | 46 | 488 | 16 | 937 | 155 | 81 |. 29 | 34 | 1108 



















































* No mottling is recorded in the case of two eggs. + 3°25— embraces all lengths from 3-245 to 3°295. 
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TABLE C. 
Organic Correlation. Mottling and Index. 
Mottling*. 
T n T 

Index a b | c | d | e |  g | g h a Totals | 
5a—+ | — — | — = | ne 1 — — ik 
60— — — 1 _— — —_ |; — —_ — : 
62— ] Sa es 3 5 | — = — 17 | 
64— 2 6 | 12 7 eS ee 3 26 
66— 2| 4/%/]— | 2% ]10) 4! 3 1 78 
68— 5 13 | 8 | 1 29 | 24 | 8 6 13 181 
70— 5 8 | 110 | 4 (5 en ee 16 4 8 260 | 
712. 4 6 j19 | 6; 51 | 35 | 2 5 2 253 
74— 1 6 77 4 36 24 12 10 5 175 
76— ] ae 36 — 10 12 8 — 2 69 | 
16 =e = 9 1 6 6 4 1 = 27 | 
80— —_ — 2 — 2 — — —_ - 4 | 
82— _- —_— - — — _— _ —_ _ 0 

= l = 1 = 1 ae = = 3 

} 86— —- — —_— - _ — l — —_— 1 

| s3g— 4 E = aa — = = - 0 

| 9m en en os coe 0 

| 99— = — — |} — 1 — caine — 1 

| Totals 22 46 488 16 237 155 81 | 29 34 1108 











Organic Correlation. 


* Two eggs with no recorded mottling. 


+ 58— contains all eggs 57-95—-59°95. 


TABLE D*, 


Value of Ground Colour. 


Value of Brown Ground Colour 


and Breadth. 


























Breadth} 4,| & | GQ) | & | A | a | my | 7, | &, | Totals | 
Ss SS ee Se ee Se eee ae eee o | 
2°65— _—_|— = —|- —}1i]— _ — 1 
2°70— 1} teyoyoflfetimtpwiyr-t}—f]e— 3 
27— | — | — 4 2 1 2;—]/1];-—-!/-— 10 
2:80— 2 1 3 1 5;—|—] 1] —j— 13 
2'85— 4 5 2 4 7 2s we DO ome 2 23 
2:90— 7 6 6 9 £7; SS 4 | 2 3 46 
295—.] 9. | 13 8 | 16 4-27 8) 4% 8 4 75 
3-00o— | 12 | 15 | 18 | 27 SE om ew. | 4 3 97 
3°05— 9 | 14 48 ini +) 2 84S 2 80 
3:10— 8 11 f) 10 S ¢ Roe. 20-3 | 1 2 60 
3*15— 1 6 4 4 »i]owyrory=+ — 2 19 
3°20— 1 4 1 ee ee eS ee ee ee 9 
32— | — | — | — 1}| — | | 1} —|—-|- 2 
ins inet ee Seth a ood WEIR bs Chole Peele Gael oes 1 
| 
Totals | 54 | 76 | 63 | 95 | 1 | 21 | 23 | 19 | 19 | 18 | 439 















































TABLE D*. 


A Cooperative Study 


Organic Correlation. Value of Green Ground Colour and Breadth. 


Value of Ground Colour. 
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Organic Correlation. Value of Green Ground Colour and Length. 
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Value of Ground Golour. 
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Value of Brown Ground Colour and Index. 


Value of Ground Colour. 


TABLE F*. Organic Correlation. 
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TABLE F". 
Organic Correlution. Value of Green Ground Colour and Index. 


Value of Ground Colour. 
















































































Index | 4, | B | C, | Dy | wz | a 16 | I, | Ky | Totals 
Ce a a ee oe ee rr ee Ge ies ees ie 1 
ee ee Sa es i ee er Me 1 | 
ee 2 2 S |. | 3 2 — | — 10 
ee i ae 5 Pe ee 3 1 I 19 
66— ‘ee 4 ei.y 5 5 |u| 2 47 
68— 5 iu! wi 6} 13 | 9 7 9 | 113 | 
70— 8 |. ll 20 36 21 | 24 10 6 16 9 161 | 
a fui e leis | @ | a1 7113 6 | 157 | 
14-- 4 6 6 | 23 | 12 | 12; 17 | 10 8 3 | 101 | 
76— 5 2 3 ee 6 4 3 4 1 40 | 
oo | — 1 2 6); 1}—| 8 l 2 1 15 | 
a eae jf me | ae Ee EE ee we ese ee 2 
82— _ te — —;};—-!—-j— — — — 0 | 
aS ee a a ee ee ee Fee PP Weng ane 1 | 
86— — —_}|— eae fe eat eres olay a _ —_ —_ 1 
88—- — — _ _— —-/—-j- -- — _ 0 | 
a Oe ieee ee ae ae eer oy Poe ee Oo | 
| 99 at = aan jal peed ae B 6 = ae. tart 0 | 
| | 
| | : | } | 
| Totals} 3¢ | 51 | 71 | 133 | 85 | 85 | 69 | 47 | 62 | 32 | 669 | 
TABLE G?: 
Organic Correlation. Mottling and Ground Colour Value. 
Value of Ground Colour*. 
Mottlingt A, A; | B, | B, | C | Cs | D, | D, | E, | Ey FP, | F, | G, | G4 | A, | Hi, | re | LA | K, | RK, Totals | 
| @ 2 2| 3| 2 Ch Siete Ty et BS Meare SE as ee 
ee Zi—'s|.3) 4: 1] 5} 4] 8] 3] 9} —|—] 8] 1] S—| 8] 8] st al 
= 20 | 12 a4 | 27 | 33 | 31 48 | 54/21 | 38| 9/38/11/ 3} 8/20) 8) 24) 9/11] 487) 
d 1/—| 1/—| 1'—|—] 5] 2] 2] 1; =/—] BE] 1:—-]/—]| 1/—]—-—] 16] 
e 11} 8!15| 6/15/15! 19/26 |13/21| 5/24] 2/15) 4; 9] 3/16] 2] BF 237) 
f 7; 2, 9/ 9| 3/11! 9/94] 7] 9| 3/17] 7|/-9] 3; 6] 6] 6| &| 5 155 | 
9 os! 91 3] $1°3) Fi emi Si 8l—1 St St a 8l Si tt sit 28 SS 
h 3) 1/4] 1}/—: 1] 4] 1] 2].4}/—] 1f]~+] 2}—; 2}/-—j] 1] 1/-—] 
PF Si—] 2} 1) Bal ao) al ae a tee 8) —) oi) ee A) ee 
| ‘ | | 
| Totals 52 | 34 76 | 51 | 63 | 71 | 95 | 138 51 | 85 | 21 | 85 | 23 69 | 19 | 47 | 19 62 | 18 | 32 | 1106 | 
' u 


























* Two eggs, one given as ‘slatey grey’ and the other as blue, have no ground colour value recorded. 
+ Two eggs have no mottling recorded. 
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348 On the Nest and Eggs of the Common Tern 
TABLE N. 


Direct Homotyposis. Breadths. 
Breadth of First Egg. 
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s10— J|—}|—|—}—!| 1!'—! 4] 10] 37) 70| 64) 27) 8|—|—] 221 | 
$15-—- |} —|—|—|—!—ji—| 2)— 7| 12 27|22/ 6/1/1 77 
$:20— =)=/=|= = —j| 1/ 2| 8) 5] 8| 5 Naat es 32 
s05— F—|—|—|—)—j—|—i— aj — tm} til)? 3 
ee Ba | — [= aie toed ce, | oats —i—, —f{—|) ij—l ili 2 
Totals | 2°| 4 | 11 | 96 | 51 | a2 |182/ 239/340] 320 |a21| 77 | s2| 3 | 2 | 1502 

TABLE O. 
Direct Homotyposis. Longitudinal Guirths. 
Longitudinal Girth of First Egg. 
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Direct Homotyposis. Transverse Girths. 


Transverse Girth of First Egg. 






A Cooperative Study 


TABLE P. 
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On the Nest and Eggs of the Common Tern 


TABLE R. 
Direct Homotyposis. Mottling with Mottling*. 
Mottling of First Egg. 
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+ The total number of pairs of eggs was 1592 ; from this table are omitted the four pairs which arise 
from one blue egg in a three-egg c'utch. 
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TABLE V. 


Cross-Homotyposis. Longitudinal and Transverse Girths. 


Longitudinal Girth. 
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(1) Introduction. 


When several mental tests are applied to a group of subjects, and the correla- 
tions Letween the test. (taken in pairs) are worked out, the coefficients are as a 
rule found not to be arranged entirely in haphazard order, but to show a certain 
degree of what has become known as hierarchical order. This means that if the 
total correlation of each test with all the others is found by adding together its 
coefficients, and if the tests are then arranged in sequence according to the order 
of magnitude of this total correlation, they are found to be also in sequence, or 
nearly so, according to the order of magnitude of their correlations with any one of 
their number. 

If the correlation coefficients are set out, as is convenient, in a square table 
such as the following, the letters 2,, #,, etc. being the names of certain mental 
tests, and the quantities 7,., 7,;, etc. the correlations between the marks scored in 
these tests, then hierarchical order shows itself in the fact that each coefficient is 
smaller than that on its right or than that below it, provided the tests have been 
arranged in sequence according to the magnitude of the total correlation of each 
with all the others. 
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The observed numbers in an actual experiment naturally do not in any case 
come out in perfect hierarchical order, and it becomes important to have a measure 
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of the degree of perfection present, and some means of estimating from what 
“true” correlations the observed numbers are most probably derived, and the 
degree of hierarchical order among these “true ” correlations. The importance of | 
this matter arises in the Theory of General Ability which has been proposed by | 
Professor Spearman, for that theory can only be considered proved if the correla- 
tions are derived from an absolutely perfect hierarchy. A merely high degree of 

hierarchical order can be attained without any General Factor whatever, by the 

random selection of Group Factors. The very difficult question therefore arises of 

deciding (if possible) whether the hierarchies actually observed in experimental 

psychology are more probably derived from perfect hierarchies such as are postu- 

lated in the Theory of General Ability, or from the good but not perfect hierarchies 

which arise in the Theory of Group Abilities*. 





A criterion which, it was hoped, would give such a measure of the perfection 
of the true hierarchy from which the observed numbers were derived by experiment, 
and which has been widely adopted for this purpose, was worked out by Dr Bernard 
Hart and Professor C. Spearman in the British Journal of Psychology for March, 
1912. The object of the present paper is to inquire into the accuracy of that 
criterion. 

(2) <A Criterion for Hierarchical Order. 

The underlying idea was that if the above square table of correlation coefficients 
shows hierarc:.ical order in any degree, there will be correlation between the 
columns of that table taken in pairs, and that when the hierarchical order is 
perfect the columnar correlation R will rise to unity, except in so far as it is blurred 
by the sampling errors, which obviously cannot increase an already perfect correla- 
tion, but can only decrease it. Let us write dashed letters throughout for the 
true values of the various quantities, which in ordinary experiment are unknown, 
reserving undashed letters for their measured values. We then have: 

r = true correlation coefticient, 
e=its sampling error on one occasion, so that 
r=? +8, 


mean of the column of true values 7", 


r 


| 


= mean of the column of observed values 7. 


In finding these means, that coefficient is omitted which has no partner in the 
column with which correlation is being found. Write also 


, 


p 


”” measured from the mean of the true column, i.e. 
= 1 —7", and similarly 
p=r measured from the mean of the observed column, i.e. 
=r-—T?r, 
‘, 
e=@p~ fp, =e-?f, 
where ¢ is the mean of the column of e’s. 


* See G. H. Thomson, “ The Hierarchy of Abilities,” Brit. Journ. Psychol. 1919, 1x. p. 337 and 
‘* The Cause of Hierarchical Order among the Correlation Coefficients of a Number of Variates taken in 
Pairs,” Roy. Soc. Proc. A, xcv. p. 400 (April Ist, 1919). 
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Then for two columns a and 3, the true columnar correlation which we desire 
to know is 
, S(p' zap’ 2) 
we AE a eo ea ee cree 1), 
1S (pe) (Pa) 
by the Bravais-Pearson product-moment formula, S indicating summation over the 
various values of a, i.e. summation up the column. 


This.can be written 


R’ = S (Pxa Pxb) -8S (€xa €xb) —S (p' ap €xa) -8S (p' xa Ex) : 

” V{S (PxaPzxa) -8 (€xa€xa) - 28 (p'xa€xa)} V{S (Px Pad) i S (€xp€2d) — 2S (px €xp)} 

fokekd (2). 

In this expression, the three quantities of the form S(pp) are known. The three 
quantities of the form S(ee) are not known, but an attempt can be made to 
estimate their probable values from the known standard deviations of the correla- 
tion coefficients. The four quantities of the form S(p’e) are treated by Dr Hart 
and Professor Spearman, in their paper, as negligible, on the ground that p’ will 
not in general be correlated with e. It is the object of the next section of this 
paper to examine the nature of the correlation of these two quantities. 





(3) The Relationship between the Correlation Coefficients and their Sampling Errors, 
in the Case of Correlation between a Number of Variates taken in Pairs. 


Consider the formula for the standard deviation of a correlation coefficient, viz. 
1-r ; 
oc, = VN Oooo eer erccee recesses eeeeseeessesees (3), 
where WN is the number in the sample. It follows from this that the larger 
correlation coefficients will probably have the smaller sampling errors e, disregarding 
the sign of e for the moment. 


But these signs of the quantities e are not likely to be indiscriminately positive 
and negative. On the contrary, they will have.a tendency to be either all positive 
or all negative, if, as is the case in most of the columns of coetticients considered 
by Professor Spearman, the: correlations in the square table are mainly positive. 
The errors in the correlation of a variate xz, with a variate a are themselves 
correlated with the errors in the correlation of the variate a with another variate 
%_, according to the formula 

2,070, (1 — ts.5,* Tae = ne + We,2,%naTae) 


Rraz, Yar, wo? Sin D) (i #2 r®, a) a oe 14) 





(4). 


That is, the correlation of the sampling errors of r,,, with the sampling errors of 
rz,a: depends chiefly upon *,,,,. To illustrate, let us take three correlations from 
an experiment in psychology, carried out by Mr Wyattt. 


* Karl Pearson and L. N. G- Filon, ‘‘On the Probable Errors of Frequency Constants,” Phil. Trans. 
of the Royal Soc. 1898, cxcr. A. p. 259. 

+ Stanley Wyatt, ‘“‘The Quantitative Investigation of Higher Mental Processes,” Brit. Journ. 
Psychol. 1913, v1. p. 131. 








358 On Hierarchical Order among Correlation Coefficients 


If we let x, be the mental test “ Rearranged Letters,” 
ay, 5 » “Missing Digits,” 
a * : » “Analogies,” 


the values there found were 

Tz,a = 0°63, 

Tr,a= 0°61. 
Then by the above formula the correlation of the errors of these two coefficients 
depends chiefly upon r,,,,, whose measured value is 0°63. Using the full formula, 
and employing the measured values in default of the true ones, the correlation 
between r,,, and 7r;,q turns out to be “47. It is therefore (to an extent indicated 
by this value) probable that they are either both too large or both too small. 
The same argument holds, in-varying degrees, for the other correlations all over 
Mr Wyatt’s table, which are all positive. They have a tendency to be either 
all too large or all too small: in other words, the e’s tend to be all of the same 
sign. The relationship between the correlation coefficients of a column, and their 
errors, can therefore be summed up in the following table, in which the symbol 
|e| denotes the magnitude of e regardless of sign. 

TABLE J, 


¥ e p’ € OT « pe or pe 


bbittt 
++e1 0 

Lt++++ 

biti 
++el ett 


The first column shows the true correlations 7’ arranged in order of magnitude. 
The second column expresses the fact that the sampling errors on any occasion 
will probably be arranged in the reverse order of magnitude, disregarding their 
signs. The third column shows the correlation coefficients measured from their 
mean. The upper p’’s are then positive, and the lower negative, and also, what is 
not shown in the table, the absolute values increase upwards and downwards from 
the point where the signs change. The fourth (double) column shows the probable 
arrangement of the signs of the quantities «. If the e’s are all tending to be 
positive, then the left-hand member of the double column gives the arrangement, 
while if the e’s all tend to be negative, the other member of the double column 
does so. As shown in the last (double) column, therefore, the quantities p’e tend 
either to be nearly all negative or nearly all positive. For a very small sample 
the signs of p’e will no doubt be quite irregularly arranged. But with such a 
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small sample, even if p’ and e¢ were really uncorrelated, it would be most unlikely 
for S(p’e) to be negligible. As the sample increases the signs tend to settle down 
to the above arrangement, and S(p’e) does not tend to disappear compared with 
S (ee), but only to take on one or other of alternative values. It will only be zero 
when all the errors are zero, i.e. when no corrections are needed to R. The 
distribution of S(p’e) about zero in a number of samples of the same size will not, 
that is, show a maximum at zero, but a minimum, as is shown qualitatively in 


Fig. 1. 


= 0 o 
— Sips) — 





Fig. 1. 


To show the order of magnitude of these neglected quantities, consider the 
following example, in which the true correlations are known a priori, and with 
their observed values were as follows: 

rca =0°730, eq = 0°703, e=— 0027, 

raa=0'598, rag=0'708, e=+0115, 

eq =0'356, eq = 0367, e=+0011, 

Wq=O174, rq =0337, e=+0°163, 

rsa =0'167, 1g = 0281, e= +0114, 

Mra =0120, ra =0371, e=+0°251, 

xg =0116, ry=O0112, e=—0°004, 

rq =0112, ry =0133, e=+0-021. 
The variates here were made up of dice throws, and the sample was one of 36 
cases. Here, knowing as we do the actual true correlations* which would be given 
by the whole population or by a sufficiently large sample, we can form the 
quantities S(€%za) and 2S(p'xa€za) They prove to be 064 and —'116. It is 
clearly unwise to neglect the latter of these in comparison with the former. 


(4) Experimental Demonstrations in Cases where the True Values of the 
Columnar Correlations are known a priori. 


The formula at which Dr Hart and Professor Spearman eventually arrive, after 
neglecting these quantities and making various other assumptions, is 
, S (pzap2s) —(n —1) rap Oza Fx 5 
Rw=—s me OE? PE = (5), 
V {8 (pea) — (n — 1) 72a} VS (p%2n) — (n — 1) 272} 
where the o’s are standard deviations of the correlation coefficients, the bar 
indicates mean values for the column, and n is the number of pairs of correlation 


eer eeeeeeeee 


* G. H. Thomson, ‘A Hierarchy without a General Factor,” Brit. Journ. Psychol. 1916, vim. p. 271. 
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coefficients concerned, in the two columns. In using their formula, its authors do 
not apply it to all the pairs of columns in the square table. They say: “In any 
case the correction must be kept within limits: as usual, the larger the correction 
the less it is to be trusted. If the sampling errors are large enough, they 
eventually will quite swamp the true differences of magnitude upon which the 
observed correlation should be based. In this case, the true correlation is beyond 
ascertainment ; any attempt at correction is merely illusory. To avoid this, and at 
the same time to ensure impartial treatment of all data, it is necessary to fix before- 
hand some definite limit to the feasibility of correction. We have here adopted 
the following standard: in order to attempt to estimate the correct correlation 
between columns, i is required that in each of these columns the mean square 
deviation should be at least double the correction to be applied to that deviation.” 

That is to say, the equation (5) is not to be used unless, in each factor of the 
denominator, S(p) is at least double its correction (n—1)o%. This condition (the 
“correctional standard ”), will be found to be important. 


It is clear that the accuracy of this formula (5) could be conveniently tested 
were we in possession of material in which all the true correlations were known 
@ priori, in addition to the observed correlations found in samples. Such material 
is supplied in perfection by correlated dice throws. 

First Example. The first experiment with dice of the above nature which 
I carried out was described in the Brit. Journ. Psychol. 1916, vu. There ten 
variates were artificially made up of group factors and specific factors, without any 
general factor, so as to make a very good hierarchy, which gave the following 
results when tested by the Hart and Spearman criterion. 











TABLE II. 
Col | | The Hart and S 
reece Observed columnar | True columnar | fs =} seer gem 
——s | correlation R correlation correcte columnar 
standard correlation R’ 
| 
— — = ait qounitiiintiiasiieinnsadats | heeciemnapmepiianieitinaninnsiantincens —— 
| 
ab 0°95 1:00 1°04 
ac 0°89 0°99 1°00 
be 0°91 1°00 1°01 
cd 0°90 1:00 111 














Means 0°91 1°00 1°04 





Here the exaggeration of the Hart and Spearman R’ is not very noticeable, for 
the hierarchy is in any case almost perfect. Indeed in this case I took some pains 
to make the arrangement of group factors imitate a perfect hierarchy very closely, 
for the sake of emphasising the point I then wished to make, viz., that such group 
factors can, unaided by any gencral factor, approach exceedingly close to perfection 
of hierarchical order. I did not then realise that the pains I took over this point 
were hardly necessary, for random sampling of the group factors gives good 
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hierarchies, though such perfection as the above would be unlikely to arise from 
chance. 


Second Example. For a second example I have therefore chosen a hierarchy 
formed thus by the chance sampling of group factors, without any general factor, 
and moreover one which shows considerable departure from perfection of hierarchical 
order, it being the least perfect of those which I have up to the present formed in 
this way. The mode‘of construction of the variates is given in detail in Roy. Soc. 
Proc. A. xcv. (April 1st, 1919) on page 402, and the theoretical correlations on 
page 403 of that article. The latter show a certain degree of hierarchical order, 
though not very high, the true mean columnar correlation R for all pairs of 
columns being 0°59. 

Dice were now thrown to form 20 measures of each of the ten variates, 
By, Wy, By, ... Ly. 

First the magnitudes of the group factors (which it will be recalled were in that 
article named after the cards of a playing pack) were decided by throwing dice, 
with the following results. 














TABLE III. 
Number identifying | Name of Group Factor 

the subject Aco 2-8, 4 8 8 % 8 °O 0 Ka GE 
1 6. 4° 6 38.5 (6.4. 2. oe le oe 
2 S 2°53" 6 4 8. 5-6 8 2S ae 
3 So 2 2 O88 ose 8 Bae 
4 6.6 2 8-8 6 6S a Oa a 
5 68 ok OS 8 8 a ea ee 
6 5582 6 aa ee ae ee 
7 1 SS ee 2 ae Se oS Se ee 
8 aye See: Se ee ee ee a ee 
9 8 P: - B Bite hae oe Seth ee ae oi 
10 6.4 3 8. 8 aa 8 ae ee 
11 4°54. 3 2.3 RS 2S 8s Se 
12 6 8 Se a Ba oe oe oe: a eae 
13 Br 8 oe Be a ee ee 
14 5S Se 6S ee Se eS eee ee 
15 5S a OS Bee ae er ee 
16 6S 38 4. Oe ee ee ee oe 
17 gS NG 8 ea a 8 8). 8 ae ee 
18 23 a OT a aS aa eS 6 eee oe 
19 6 28 Ss. 2 eae Se a ee 
20 SS: 36 Ge a oe. i ee a 








Using these numbers, we can make up the scores for the group factor portion 
of each of the ten tests described in the article quoted. There results (see Table IV). 

The proper number of dice, as described in the article quoted, were then 
thrown for each test and for each subject to represent the specific factors, and the 
scores of these dice added to the scores given in the last table, the resulting total 
being the complete score for each subject in each test (Table V). 

From the dice scores the observed correlations between the variates can be 
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Number identifying 
the subject 
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calculated, just as the correlations between 
the product-moment formula we obtain the set of values in Table VI, arranged 
in hierarchical order, only slightly different from the true hierarchical order, except 
that variate a has changed its position rather violently. 














Scores in the group factor portion of the tests 
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mental tests are calculated. Using 
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| TABLE VI. 
The Observed Hierarchy. 








| Xo Lz X9 Us XV % vy zs M% Xs 
| ‘ 
| 
ae . 72 47 6-@4 CO BOOKA (tii (iC 
jh arige ms 48 3 7 48 ‘32 67 -2 ‘10 | 
2p 47 “re 51 46 “45 50 46 —-02 24 
| bs 64 #43 dle 38 @ 2 15 -29 08 
Le 53.675) 46 BCBG OH =D 
te 50 “48 45 60 63 22 «29-1618 
2, yr a a n°  , ce’ | i Maes 7 
un 4 67 4 15 33 29 #41 42« —-20 08 
} Ls 21 -—-°26 —°02 "29 05 --'16 38 —-20 * — ll 


Xs 09 ‘10 24 08 —--'ll 18 “15 08 —-'ll > 





The pairs of columns which pass the Hart and Spearman correctional standard 
give the following values : 


TABLE VIL. 





} 








Columns | = | The Hart and Spearman 
passing Observed columnar | True columnar cOrkantan eolaciner 
| enema correlation R correlation eavealation 9" 
2&7 0°73 0°75 0°76 
6&7 0°63 0°89 1°15 
2&3 0°70 0°60 1°01 
2&6 0°81 0°88 1°06 
3&6 0°66 0°83 1°04 
Means 0°71 0°79 | 1-00 
| 
os eed wee nAget ohh tae 











True mean columnar correlation of ) 
the whole table and not merely 
of the pairs of columns selected 

| by the correctional standard 





Dr Hart and Professor Spearman would therefore claim the hierarchy as being 
a sample of a perfect one. The true mean columnar correlation for the whole 
table is 0°59, the Fart and Spearman correctional standard selects pairs of columns 
whose true mean columnar correlation is 0°79, and the mean value of these when 
corrected according to their formula rises to unity. This example goes far, I think, 
towards shaking confidence in their criterion. 


It must, I think, be partly chance which makes it so peculiarly unfavourable to 
their work: but I give it as it came. Really a very large number of such examples 
is necessary, and not alli of these could be expected to be so unfavourable. The 
only other example which I have attempted I have carried far beyond 20 cases 
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without as yet reaching a point where any of the columns pass the correctional 
standard. I feel that working a large number of such examples is beyond the 
power of an individual, with other claims on his time, and rather a task for a 
statistical laboratory with experienced computers and mechanical aids. 


(5) The Effect of the Correctional Standard. 


Clearly the fact that the criterion is apparently too large in a majority of cases 
requires further explanation beyond the error already pointed out of neglecting 
the terms in p’. 

The other approximations made in obtaining the criterion do not appear to be 
so erroneous as this one, though their cumulative effect may explain some 
anomalies, Leaving them on one side let us consider the “ correctional standard ” 
required by Dr Hart and Professor Spearman before they admit any pair of columns. 
It is this correctional standard, combined with the peculiar distribution of RF’, 
which chiefly is responsible for the exaggeration of perfection produced by this 
criterion, and for the regularity with which an average value of unity is arrived at. 

Let us examine first the actual distribution of the Hart and Spearman R’ in a 
psychological hierarchy, viz., that of Wyatt already referred to, and calculate R’ 
not only for those columns which pass the correctional standard, but also for other 
pairs of columns. What we find is that its value rises as we descend the hierarchy, 
rushing asymptotically to infinity, remaining for a time imaginary, and then 
returning. The value reaches infinity when one of the corrections in the denomi- 
nator becomes as large as the term to be corrected, and remains imaginary until 
the other term is likewise passed by its correction, when both quantities under the 
square root are negative and an arithmetically possible but meaningless value is 
again calculable. Specimen values from Mr Wyatt’s hierarchy are given in this 


Table. 


TABLE VIII. 

Pairs of Columns 7alues of the Hart and Spearman R’ 
Analogies and Wordbuilding 0°93 
Completion and Wordbuilding 0°97 | Passed by the 
Completion and Part-wholes 1:05 ¢ correctional 
Wordbuilding and Part-wholes C99 | standard 
Part-wholes and Memory (delayed) 0-92 
Rearranged letters and Missing digits ‘117 
Wordbuilding and Z R Test 1:26 
Sentence construction and Fables 1:33 
Rearranged letters and # & Test Practically infinity 
Nonsense syllables and Dissected pictures Imaginary 
Crossline test and Letter squares 0°35, both factors in the denominator 


being now negative. 
Expressed in diagrammatic form this and similar calculations lead to the 
conclusion that in actual practice the criterion is distributed as in Fig. 2, where 
the curve is to be understood as a “best fitting” curve among the values of R’ 
scattered, with a very considerable dispersion, on both sides of it. The line, in 
fact, ought to be a broad smudge. 
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Now clearly, with a distribution of this sort, it is very important that the 
boundary between the values that are to be rejected and those that are to be 
accepted should he chosen with the greatest care, and not arbitrarily but scientific- 
ally. Either sound theoretical reasons should be given for the choice of the 
correctional standard, or the choice should be based empirically on experiments in 


R’ imaginary 





the Correctional Standard 


set by 








i 
c 
Boundary 


The Hart. and Spearman Criterion 4’ 

i 
= 
B. 
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4 

% 
> 











zero 





Descending the hierarchy—> 
Fig. 2. 
material where the truth is known a priori, as in the above dice experiments. For 
obviously, by moving this boundary, we can make the final average take on almost 


any value. Another point is that the criterion rushes to infinity at such speed 
that its probable error must be enormous. Dr Hart and Professor Spearman, 


VOL. 12 Z 








366 On Hierarchical Order among Correlation Coefficients 


however, give no reasons for their choice of this particular standard, upon which 
depends so much the values they obtain. The standard which they thus arbitrarily 
adopt begins admitting the criteria at just such a distance above unity as to 
balance the cases which give a criterion below unity, ‘and entirely explains the 
remarkable unanimity with which this average value unity is obtained by them in 
their calculations. 


(6) Conclusion. 


A criterion suggested by Dr Hart and Professor Spearman has been widely 
used by psychologists for the purpose of ascertaining the degree of “ hierarchical ” 
order among theoretical correlation coefficients of which only experimental values 
are known, and a Theory of General Ability has been based on the results. In the 
present paper it is however shown theoretically that an assumption made in 
deducing this criterion, namely that p’ and ¢ are uncorrelated and the sums S(p’e) 
negligible, is incorrect. The quantity e taken regardless of sign is strongly corre- 
lated with p’, and its signs tend to be either all the same as, or all different from, 
those of p’. The distribution of the sums S(p’e) shows a minimum, not a maximum, 
at zero. 


Otherwise the paper is empirical, and applies the criterion in question to 
correlated dice throws. In the cases tried, this criterion exaggerates the perfection 
of the hierarchy considerably, claiming a quite poor hierarchy formed by random 
group factors as being perfect (true mean columnar correlation 0°59, the Hart and 
Spearman R’=1:00). The reason for this exaggeration, and for the unanimity 
with which in so many experiments the average value unity has been found for the 
Hart and Spearman criterion, appears to be mainly the peculiar distribution of 
this quantity, combined with the action of the “correctional standard ” adopted, 
which commences admitting the criteria at such a distance above unity as to 
balance those which are less than unity. 














MISCELLANEA. 


I. Inheritance of Psychical Characters. 
By KARL PEARSON, F.R.S. 


In view of the papers that have been published on the inheritance of intelligence, it is 
strange that there should still remain any doubt that psychical characters are inherited at the 
same rate as physical characters. But having regard to the existence of that doubt any material 
bearing on the point deserves special recognition and emphasis. 


In a recent contribution to the Journal of Delinquency, Vol. tv. p. 46, Dr Kate Gordon gives 
the results of her tests by the Binet-Simon method of the intelligence of the children in three 
orphanages in California. Among other data she gives, almost as an aside, a small table for 
the correlation in intelligence-quotients of 91 pairs of siblings. This table appears to me 
of very considerable interest and supplies what: is occasionally lacking, a nearly uniform environ- 
ment* both in training and in nourishmént to the pairs dealt with. Those who dislike the 
idea that the mental as well as the physical characters are largely fixed for us by our ancestry 
are apt to attribute—regardless of known measurements of the intensity of environmental 
influence—the correlation of pairs of siblings for mental characters to a differential environment 
of the pairs, i.e. to differential family or home training. Hence the value of data obtained 
within the walls of an orphanage, as tending to minimise this differentiation. 


The Intelligence Quotient, it will be remembered, is the ratio of the mental age as given by an 
intelligence test of the Binet-Simon type to the actual age. The accompanying correlation table 
is the ‘scatter’ table of Dr Gordon rendered symmetrical, so that we can enter with either member 
of the pair. The probable error must, of course, be calculated for the correlation on the basis of 
91 pairs, but for the mean and standard-deviation on 182 individuals. We find: 


Mean Intelligence Quotient exh te =92°857 +°836, 
Variability in Intelligence, s.p. ... se =16°727 +591, 
Coefficient of Variation ... eee eas =18°014 +°657, 


Correlation of Intelligence between Siblings r= °5082 + 0524. 


At first sight it might seem as if the mean Intelligence Quotient was somewhat low. For a 
normal child it should be theoretically 100, but so much depends on the nature of the tests 
used and also on the manner in which they are applied that we cannot dogmatise on this point. 
In some recent American data we found a very low intelligence quotient among literate adults, 
and the result was clearly due to the nature and method of applying the test. The coefficient 
of variation in this case rose to the high value of 38°52, fully double the value we have found in 
other cases. We may note that the coefficient of variation is also large in the present case, 
which is distinctly against intelligence being much influenced by environmental conditions— 


* The ideal method would be to take all the siblings in a very large orphanage, such for example as 
the Reedham asylum, and select if the numbers should prove adequate only the children who had 
entered the orphanage at an early age. 


Inheritance of Intelligence Quotient in Siblings. 


First Sibling. 


Intelligence Quotient : 
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Miscellanea 369 


for in this instance we have considerable approximation to uniformity of environment. For 
261 normal children examined by the Binet-Simon method by Dr Jaederholm, I find the 
coefficient of variation in. intelligence as measured in mental years to be 19°476. For 420 
children in two schools I find a coefficient of variation in general intelligence of 21°986, and for 
1725 children in eight schools I find a coefficient of variation in terms’ marks of 23°133 
These are somewhat greater than the variability obtained for the orphanage children, but do 
not show the great increase some might anticipate from variety in home and school training, 
and the increase of the last two results may be golely due to the different standards imposed by 
the judgments of a variety of teachers instead of, as in Jaederholm’s and our present cases, 
an identical series of tests made by a single psychologist. The noteworthy valus, however, lies 
in the correlation of 508. The values obtained for 12 cases of physical characters in siblings 
(Biometrika, Vol. 11. p. 387) have exactly this value for their mean. No stress can, of course, be 
laid on the absolute: identity considering the smallness of the present series, but much stress 
may be laid on the approximation of the two results. 


But the present data are of further interest —although they are so slender —when we compare 
the results to be obtained from them with those for a far longer series of pairs of siblings 
obtained by the method of “broad-categories.” This series is also formed from pairs of siblings 
who are children. They belonged to a great variety of schools taken throughout Great Britain. 
Every variety of environment, every variety of educational and home training is therefore 
included. Accordingly if the intellectual resemblance of siblings were the result or largely 
the result of differential treatment, we ought to anticipate a great increase of correlation in this 
material over that of the material drawn from the Californian orphanages. We have also the 
possibility of obtaining light on two further problems : 


(i) Whether the method of “broad-categories” really does give results markedly inferior 
to the Binet-Simon method of direct quantitative measurement. 


(ii) What is the approximate Value of the “‘mentace” or unit of intelligence in terms of a 
unit obtained from a Binet-Simon test. 


The definitions of the “broad-categories” used by the Galton Laboratory in its intelligence 
investigations have already been published in this journal*, and a “mentace” has been defined 
as the +p part of the range which limits the category “Intelligentt.” Now if we compare the 
two series, the one determined by “broad-categories ” and the other by the Binet-Simon test for 
the total frequencies up to the beginning and up to the end of the range “ Intelligent,” we shall 
have a first approx:mation—on the assumption that both series are measuring the same.general 
intelligence character and both approximate to normal distributions—-to the absolute value of a 
“mentace.” I find that my-mentace is equal to ‘1604 of Dr Gordon’s intelligence quotient 
units, or with the average age of 10:2 (which appears to have been that of her children) it equals 
six days about of mental growth of children at this age. J%oughly we might say that a mentace 
is equal to about a week’s mental growth at the age of ten years. In estimating the meaning of 
this statement we must remember that mental growth is very rapid at this age. 


As the American data pool children of both sexes I have for purposes of comparison done the 
same. The following table represents my material for 5602 children in 2801 pairs, each pair 
being entered either way so as to produce a symmetrical table. 


* Biometrika, Vol. vit. p. 93. 

+ Biometrika, Vol. v. p. 109. 

+ The reader will of course avoid the conclusion that the mentace is an intelligence unit varying 
with age. It is the time rate of growth of intelligence which varies with age, and we must state 
a particular age in evaluating the mentace in terms of growth of intelligence. 








Miscellanea 
























































leads to 


and the other to 


Binet-Simon tests) and equated their s8.D.’s. 


Thus a mentace='1604 1.9.U. 


? = 293, 1833. 


?= '286,9355 


? ="288,5659 


Hence finally we have for the correlation 
r=°5147 and r='5158, 


Tee Tey =e oe 91738. 





giving the nearly equivalent results for the contingency coefficient C2, 
C,= "4722 and C,=°4732. 





The class-index correlations are the same in both directions and we find : 





* It is hoped to publish shortly the long-delayed memoir on contingency corrections. 
has largely arisen from the labour involved in reducing adequate material by way of illustration. 





Contingency Table for General Intelligence in Siblings. 
Category of Intelligence of First Sibling. 
J a | 
Quick | : Slow | Slow Very 
S Tntelligent | Intelligent Intelligent Slow Dull Dull Totals | 
our 
LZ 
| Quick {ntelligent | 312 263-25 | 131-75 | 40 16-25 | 4:25 | 767°5 | 
ie Intelligent 263°25 876°5 564°25 172°5 36°25 14°75 | 1927°5 | 
25 | Slow Intelligen 131°75 | 564°25 | 697°5 249°5 | 725 | 27 1742°5 | 
3 Slow... ww 40 172°5 249°5 219°5 80 12 773°5 | 
6 § | Slow Dull 16°25 36°25 72°5 80 68 18 291 
PS | Very Dull 4°25 14°75 27 12 18 24 100 
©” | 
3 | | 
oO Totals 767°5 1927°5 | 1742°5 773°5 291 100 5602 


The first question that arises is that of the method to be employed in the reduction of this 
table. The answer is fairly straightforward. The only legitimate method is that of corrected 
contingency. The mean square contingency of this 6 x 6 fold table is 


Of the two methods of correcting this raw mean square contingency* for number of cells one 


and accordingly it is amply adequate to take the correlation of siblings in general intelligence 
be ‘515, which agrees excellently with the value ‘508 found from Dr Gordon’s data. 


But as the bald figures ‘508 and ‘515 convey little to the mind untrained to statistical 
appreciations, I have attempted to provide an illustrative diagram : see Plate VII. 


Assuming normal distribution for the marginal totals and the arrays, I have superposed the 
means of the two systems (General Intelligence and Dr Gordon’s Stanford Revision of the 
Using 1.9.U. for an intelligence quotient unit, i.e. a 
change of one digit in the intelligence quotient (or 100 mental age/ physical age), we find : 


Mean = 578'909 mentaces = 92°857 1.9.U.’s, Standard Deviation = 95°5566 mentaces = 15°3215 1.9.U.’s. 


The delay 
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Boundaries between Categories and Means of Categories measured from 
the Mean of Intelligence. 


Quick Intelligent and Intelligent 1045294 mentaces or 16-7602 1.@.U.’s. 


Intelligent and Slow Intelligent 45294 ,, = ‘7263, 
Slow Intelligent and Slow — 77°7640_ ,, »  —12°4686_ ,, 
Slow and Slow Dull —141:1658 ,, » —22°6845 ~ ,, 
Slow Dull and Very Dull —200°6975 __,, a 
Mean of Quick Intelligent 152-967 Es x 24°527 s 
» Intelligent 49°756 “ a 7978 
» Slow Intelligent — 34-410 ‘s » — 5517 _ 
» Slow eS er See ig 
5, Slow Dull — 165°590 3 »  —26°551 - 
» Very Dull — 235°299 »  —387°728 = 


After careful consideration of a number of factors we divided* our “ Intelligent” category of 
100 mentaces range into “ Fair intelligence” for the first 45 inentaces and “Capable” for the 
remaining 55 mentaces. Our “Quick Intelligent” category was again subdivided into a range 
of 200 mentaces corresponding to “Specially Able” and to “Genius” or the 1:4 per mille who 
exceed the mental type by more than 300 mentaces. The ‘Very Dull” were again subdivided 
at 300 mentaces less than the mean and the 1:4 per mille beyond this may be. looked upon as 
mentally defective. This per mille of mental defectives corresponds fairly well with the primary 
school returns. Thus the average ‘genius’ will have 312 mentaces or be almost exactly 50 
1.9.U.’s above mediocrity, ie. with a mean of 143 1.9.U.’s, and the average mentally defective 
312 mentaces or 50 1.9.U.’s below the type, i.e. will have about 43 instead of 93 for intelligence 
quotient +. These limits are marked on our diagram. 


Dr Gordon’s results therefore bring out a point that was not correct in my diagram of 1906. 
The zero of intelligence is not about 300 mentaces below mediocrity, but nearer 600! Even an 
“imbecile” girl has an intelligence quotient of 29, or some 180 mentaces, where I in 1906 
assumed she should be credited with none. I still think complete imbecility should be marked 
by a total absence of mentaces or by a zero intelligence quotient. It appears better therefore to 
talk of those with intelligence less by 300 mentaces than the mean as mental defectivest. The 
problem is rather theoretical than practical, depending not so much on the existence of zero 
intelligence, as on the limen or threshold value at which we are able to realise its existence. 
Anyhow the conclusion seems to be that we must search a large number of millions if we wish 
to find an individual absolutely without intelligence. 


Examining our diagram we note how extremely closely the black points which represent the 
means of the general intelligence categories lie on their regression line. They lie so closely that 
we might almost feel disappointed that the means for the Slow Dull and Very Dull categories 
are not equally close to the regression line. But here regard must be paid to the fact that these 
are the smallest of the categories in size; and further to disturbing factors arising from the 


* See Biometrika, Vol. v. p. 110. 

+ Dr Gordon notes a very able girl with 137 1.9.U.’s and an imbecile girl with only 29 1.9.0.’s in 
a total of 335 cases. 

~ I wrote in 1906 (Biometrika, Vol. v. p. 111 ft.) that: ‘He [the median individual] can hardly 
have more than 350 to 400 mentaces, for at a negative position of — 350 to — 400 on the scale we have 
passed through the very dull group into imbecility and complete absence of reasoning power. The 
child whose low grade of intelligence occurs only 3 or 4 times in 100,000 cases must be sought in the 
idiot asylum.” I was probably wrong in assuming the worst type of idiot had zero intelligence. 
Dr Gordon’s mean is 6°06 times her s.p., or the absolute zero of intelligence would only occur 1 in 
100,000,000. This is probably excessive. Dr Jaederholm’s data appear to indicate 5°5 times as the 
ratio or 1 in 12,500,000 as the occurrence. 
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probability that dull children remain longer at school than very intelligent ones, and start later. 
These factors act*in a not easily interpretable way on the number of pairs of dull and very dull 
children. 


On the other hand Dr Gordon’s plotted observations show little more than the variations due 
to random sampling in such a slender series. Both sets of observations together undoubtedly 
indicate within the limits of error one and the same law of relationship*. It is-almost impossible 
to conceive that such diverse environmental conditions rather than a fundamental germinal 
relation could produce such concordance. The conclusion which is emphasised by material 
drawn by such different methods from such very different environments is that the relation of 
intelligence between siblings is fixed by something more innate than environment. That some- 
thing more innate, more constant and more universal in its domination can only be the hereditary 
factor. 


Of course the results in the present paper for the relation between Intelligence Quotient and 
Mentace can only be considered as suggestions until we have far longer series of pairs of siblings 
tested by the Binet-Simon or allied methods. But'they serve to indicate that very fruitful work 
can be achieved in this direction, and even the present data owing to their relatively limited 
environmental gonditions may help to dispel the notion—largely based on prejudice, not on 
acquaintance with actual measurements—that differential environment is the source of resem- 
blance between siblings. 


II. Variation and Distribution of Leaves in Sassafras. 
By N. M. GRIER. 


The following note is made on the basis of examination of ten sassafras trees and 102 seedlings 
near Pittsburg, Pa., and eight trees near St Louis, Mo. Only three kinds of leaves were met 
with, three-lobed, two-lobed and single-lobed, but it may be inferred that the same laws will 
govern the distribution of the four- five- and six-lobed forms described by Berry some years ago in 
the Botanical Gazette. 


The single-lobed leaves are in great preponderance, constituting two-thirds of the foliage in 
Pittsburg specimens, while, near St Louis, three trees were observed in which other than single- 
lobed leaves were wanting. In these an extensive self-pruning had taken place. The terminal 
leaves of young branches are single-lobed, although there may be an occasional two-lobed leaf: 
Tops of trees are usually composed almost entirely of single-lobed leaves. 


The dissected forms of leaves appear to be most plentifully developed under the influence of 
shade. In such cases they were most thickly distributed at the middle of the tree (as has been 
noted for three-lobed leaves in the Britton and Brown Flora), on young twigs whose terminal 
leaves were dissected, and toward the bottom on older twigs. There was a tendency for more 
three-lobed and less one-lobed leaves to be found on smaller twigs growing near the trunk, but 
occasionally on larger twigs, or smaller boughs growing among the larger boughs. 


No transitional forms between the three-lobed and two-lobed leaves were noted on the same 
tree:. The lgtter apparently increase in number as the three-lobed forms decrease, and are 
associated mostly with the single-lobed leaves, being about equally distributed between the 
younger and older twigs. They are rarely found ac the top of the tree. Evidence that the 
available amount of light may play some part in the distribution of leaves is found in the fact 
that the great majority of observed seedlings growing in the shade develop the three-lobed or 
two-lobed leaves in combination. Contrast is offered by a statement made in a standard 
American textbook of botany—“ In Sassafras, almost any leaf may be entire or variously lobed, 


* Both series of observations also indicated how satisfactorily the normal law of distribution may 
be applied to material of this kind. 
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apparently without relation to transpiration, nutrition, etc.” Tt will be observed that these 
findings substantiate in general those of Fry made in 1902*. 


Bearing in mind the foregoing statement, an attempt was made to ascertain experimentally 
the relation of amount of light as to kind of leaves developed. This year’s twigs bearing only 
one-lobed forms were tied back into shaded positions. Of ten such cases, three twigs produced 
isolated, three-lobed leaves. “In another lot of the younger twigs bearing only one-lobed “forms, 
the leaves were stripped from the twigs, and these too tied back in the shade. Only one twig of 
this lot responded, producing two two-lobed and one three-lobed leaf. A consistent explanation 
of this fragmentary evidence would be that the formative elements for three-lobed leaves in the 
twigs are stimulated to produce those forms. A more positive point brought out is the lack of 
proliferating power in the trees under the condition of the experiments—when compared with 
other forms possessing divided leaves as the mulberry—the majority of mutilated twigs at this 
season, early August, not renewing their leaves. The writer is indebted for use of material to 
Mrs W. G. Gibson of: Avalon, Pa., and Prof. W. J. Stevens, Field School, St Louis, Mo. 


III. Life-History Albums. 
By ETHEL M. ELDERTON. 


The Personal and Family History Registert compiled by Dr Taylor is extremely interesting, 
and if people could be persuaded to keep the records asked for and to forward the book when 
completed to some central agency such as is intended, the statistical data then available should 
be most useful. In this register under one cover all the children of one family have their life 
histories recorded, and if the individuals are to be studied only in their childhood this is an 
advantage, but if it is hoped by means of a register to provide the life history rather than the 
child history a separate volume for each child would be preferable; then as each child left the 
home the book could go with him to be continued and completed. Francis Galton in the Zife- 
History Album issued years ago preferred this second plan and arranged that each child in the 
family should ‘have its own album f. 

To the statistical worker in Eugenics so many problems in heredity are still unsolved, 
problems dealing with fertility, with inheritance of disease, with age at death, etc. that no record 
of personal history seems adequate which does not provide the data from which such problems 
can be attacked. In the Personal and Family History Register information as to date of birth 
and date of death is sought for parents, grandparents, great-grandparents, etc. up to the sixty- 
four ancestors in the seventh generation, and such a- record is interesting, but one feels that 
cause of death and some information as to general health, if obtainable, would make the data 
more useful. Further there is no space assigned for collaterals. In the introduction the 
following occurs: “It is of interest to obtain data also on collaterals (uncles, aunts, cousins, etc.) 
and alliants (members by marriage). These extras can be inscribed on a page marked ‘Special 
Happenings’ or on separate sheets or cards, and placed in the pocket at the end.” Our ex- 
perience is that even. when a special space is provided for an entry the information required 
is not always given, and I think that except in a very few cases extra data of this kind will 
not be given, and I am inclined to think that knowledge of the brothers and sisters of the 
parents is of. more importance for determining the hereditary characteristics of an individual 
than knowledge of the great-grandparents. Cousins, we found, were as closely related to one 
another as grandparents to their grandchildren, and the data concerning them could be more 


* Biometrika, 1. 258, Jan. ¢ 
+ Ourselves. A Personal and Femily History Register, by John Madison Taylor, A.B., M.D., 
published by F. A. Davis Company, Philadelphia, 1917. 
t This Album is now re-issued by the Galton Laboratory through the Cambridge University Press. 
Price 9s. net. 
VOL.1I2 —AA 
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easily obtained and would be more reliable than those concerning individuals who lived perhaps 
100 years ago. 

Personally I feel that careful details concerning the life history of a baby, interesting as they 
may be, are of little value to’ the student of Eugenics, unless the hereditary history is fully 
given. The old difficulty of deciding the relative importance of eugenics or euthenics (the word 
used by Dr Taylor to describe the science of right living) is impossible of solution if the facts 
concerning any individual are restricted entirely to one side or the other. In Dr Taylor’s 
Register the family is crowded out by the personal element. Dr Taylor fully recognizes our 
ignorance of the laws of heredity and of the question of how far “pronouncedly unfavourable 
heredity ” can be influenced by euthenics, but I think he assumes that the race can be improved 
through a better environment to an extent which I venture to think is unproven. 

Both the Personal and Family History Register and the Life-History Album are rather large 
volumes, somewhat alarming to the busy parent from the very size of them. But those who are 
keenly interested in the well-being of the race will te induced to keep the record ; they will be 
limited in number and may belong to a rather narrow circle, at least at the present time when 
the science of eugenics is still regarded as the fad of a few individuals. 


I believe that The Record of Family Faculties issued by Francis Galton in 1884 would prove 
far more convenient both for the recorder and for the statistical worker than either of the two 
more bulky registers, the Life-History Album and the Personal and Family History Register. 
It is thirty-five years since this volume was first published and one marvels at the genius 
of the man who then saw what data would be needed to solve the problems of the present day. 
The introduction to this book supplies in 4 few words the justification for requiring the data 
and indicates the reason for the questions asked. Thus: 


3. Age at marriage Total (sons No. of sons deceased Ages 
4, Age of husband No. of a hae No. of daughters deceased §_ Ages 


In the introduction Francis Galton writes “The ages at marriage of the two parents, the 
number and the duration of life of the children, would enable inquiry to be made into fertility 
as associated with different admixtures of race or of disease tendencies. We have yet to learn 
the conditions under which some families are prolific in their various branches, and others die 
out.” Further Question 5 is, Mode of life so far as affecting growth or health, and the justi- 
fication for asking this question is as follows: “The mode of life, so far as it affects growth 
or health, would, if known, throw light on the effect of nurture over nature. We require to 
select the families in each of which there had been a noticeable difference in the mode of life of 
two or more of its members, and to cross divide those members into two groups, in one of 
which the mode of life had been healthy, the other in which it had been the reverse. Then by 
contrasting these groups we should see the relative effects of good and bad nurture on the 
development of body and mind, and on the health, fertility, and duration of life.” 

According to the problems with which one has come in contact, each investigator would 
desire certain modifications in the questions asked, but on the whole, I believe that a collection 
of Records of Family Faculties would enable one to determine “ many vital questions in domestic 
economics,” and it is very desirable that this book of Galton’s should be reissued. 


IV. The Check to the Fall in the Phthisis Death-rate since the Dis- 
covery of the Tubercle Bacillus and the Adoption of Modern 


Treatment. 
By KARL PEARSON, F.R.S. 


In 1911* I painted out that from °65 to about ’95 there was a continuous and rapid fall in 
the corrected phthisis death-rate, and also in the percentage which the deaths from phthisis 
were of all deaths. I further indicated that from 1895 onwards there had been a check to this 


* The Fight against Tuberculosis and the Death-rate from Phthisis, Cambridge University Press, 1911. 
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rapid fall and that the curves seemed to indicate that an actual rise in the phthisis death-rate 
might in the near future be reasonably anticipated. This view was rendered still more probable 
when I plotted the returns for 1910 to 1914. Since then the Great War has rendered it almost 
impossible for us so to feel our way in mortality statistics, that we can get returns comparable 
with the pre-war data. It seems to me, however, just worth while to see what our graphs 
will look like with the war years added to them. I must thank Dr Stevenson of the General 
Register Office for a renewal of his unfailing courtesy in providing me with the required data, 
and furthermore for several valuable suggestions as to the source of the remarkable results 
manifested. 


If we could trust the accompanying diagrams the anticipated rise in the phthisis death-rate 
has already occurred. But complete trust would be very much misplaced. In the first place 
our phthisis death-rate is for civilians, and since able-bodied civilians have been largely drawn 
into the army, there has naturally been a heavier death-rate of all kinds, and therefore a heavier 
phthisis death-rate than in pre-war times. There might therefore be nothing really significant 
in the marked male death-rate rise. On the other hand this explanation hardly applies to the 
rise—it is true not so marked—in the female death-rate. At the same time the whole nation, 
male and female, has been more crowded together in factories and subject to far greater strain 
than in pre-war days. This would naturally tend to emphasise the death-rate of women as well 
as of men. If we turn, however, to our second diagram we see that not only has the phthisis 
death-rate increased like the general death-rate, but it has been increasing at a more rapid rate 
than the general death-rate. This can only be accounted for on the assumption that phthisis 
more than all other diseases will be emphasised by war-strain. It can hardly be said that we 
were relieved of war-strain during 1918, indeed some of the hardest months of work and some 
of the periods of heaviest depression occurred in that year; there was further a most severe 
epidemic of influenza, and many deaths, Dr Stevenson tells me, recorded as influenza and phthisis 
were tabulated under the latter. Yet notwithstanding strain and influenza the proportion of 
phthisis deaths to deaths in general fell (see Diagram ii). 


A noteworthy feature is that the tuberculous mortality in lunatic asylums increased in an 
extraordinary manner from an average of 1800 deaths in 1912-14 to 5605 deaths in 1918. 
Dr Stevenson tells me that this will practically account for half the increase in tuberculous 
deaths for the total population in that time. Now this raises very important questions which 
ought to be answered. Were the lunatics who died of tuberculosis lunatics before the war, and 
again were they tuberculous before the war? Or did more lunatics become tuberculous owing 
to bad conditions—removal of much nursing and medical supervision—during the war? Or 
again did the tuberculous lunatics enter the asylum during the war? That is: Were the 
phthisical, simply because of their phthisis, less able to avoid mental breakdown under the 
severe war conditions? If so they would probably have died of phthisis outside the asylum 
in non-war conditions, and it would not be legitimate to cite the increased tuberculous deaths in 
asylums as something anomalous. 


On the whole it is risky to form a very definite judgment, but having regard to the female 
phthisis death-rate and to the percentage of the phthisis death-rate on the general death-rate, 
war difficulties do not seem to me sufficient to obscure the general trend of our graphs (as 
indicated before the war), namely that somewhere about 1915 the fall in the phthisis rate which 
had been less rapid since 1895 would cease altogether and probably be followed by a rise. The 
next five years will show whether this be true or not. We should expect a fall in the phthisis 
death-rate immediately, but on the average the value will remain higher than that of 1915. 
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Plate 





(a) 


Common Tern just alighting, to indicate great length of wings 





(6) 


The four-egg Clutch: see p. 318 


Photographs by W. Rowan 
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Photographs by W. Rowan 


(d) 


operations of the photographer. Camera about 18 inches away 


The bird is angry at the 





Common Tern sitting. 
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